Re: Best practice: Autosuggest/autocomplete vs. real search
It wouldn’t be easy if in the site you’ll ensure that only terms are submitted to the actual search? In app I worked some time ago the default behavior of the Javascript component used for autocompletion was to first autocomplete the term in the input and then submit the query against the backend. I know this is not what you’ve asked for but could work? I’m just firing a bullet in the air here! :-) On Nov 10, 2014, at 8:37 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: The goal is to ensure that suggestions from autocomplete are actually terms in the main index, so that the suggestions will actually result in matches. You've considered expanding the main index by adding the suggestion n-grams to it, but it would probably be better to alter your suggester so that it produces only tokens that are in the main index. I think this is basically how all the Suggester implementations are designed to work already; are you using one of those, or are you using the TermsComponent, or something else? -Mike On 11/10/14 2:54 AM, Thomas Michael Engelke wrote: We're using Solr as a backend for an ECommerce site/system. The Solr index stores products with selected attributes, as well as a dedicated field for autocomplete suggestions (Done via AJAX request when typing in the search box without pressing return). The autosuggest field is supplied by copyField directives from certain select product attribute fields (description and/or name mostly). It uses EdgeNGramFilterFactory to complete words not yet typed completely, and it works quite well. However, we come across an issue with a disconnect between the autosuggest results and results of a normal search, that is, a query over the full fields of the product. Let's say there are products that are called motor. - When autosuggesting, typing mot autosuggests all products with motor, because the EdgeNGram created m, mo, mot, moto and motor, respectively, and it matches. - When searching for mot, however (i.e. pressing enter when seeing the autosuggestions), it doesn't find any products. The autosuggest field is not part of the real search, and no product attribute contains mot as a word. One obvious solution would be to incorporate the autosuggest field into the real search, however, this adds many tokens to the index that aren't really part of the products indexed and makes for strange search results, for example when an NGram is also a word, but the record itself does contain the search term only as part of a word. Are there clever solutions to this problem?
Re: How to choose only one best hit from several ones ?
How would you measure which snippet is the best? On Nov 9, 2014, at 1:59 PM, SolrUser1543 osta...@gmail.com wrote: Lets say that for some query there are several results , with several hits for each one , which shown in hightligth section of the response. Is it possible to select only one best hit for every result ? there are hl.snippets parameter which controls number of snippets . hl.snippets=1 , will show the fisrt one , but not certenly the best one . -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-choose-only-one-best-hit-from-several-ones-tp4168416.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search for partial name in Solr 4.x
The whole idea behind Solr is to solve the problem that you just explain, in particular what you need is to define the title field as a solr.TextField and then define a tokenizer. The tokenizer essentially will transform the initial text into tokens. Solr has several tokenizers, each which its special characteristics, nevertheless one of the must commons is the StandardTokenizer, but again your choice will be influenced by how do you want to “divide” your initial text into “parts” or tokens. Basically when you fire a query against Solr (put it in simple words) will match the tokens of your query to the tokens stored in each of your documents, and the will output a list of matching documents. One simple example of a fieldType you could use is: fieldType name=text class=solr.TextField sortMissingLast=true analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType In this case the tokenizer will split the initial text into the tokens, and then each token will be lowercased so when you query you wouldn’t have to worry about the capitalization of the terms. Hope it helps On Nov 9, 2014, at 3:26 PM, PeriS peri.subrahma...@htcinc.com wrote: I was wondering if there is a way to search on partial names? Ex; Field is a string and stores values like titles of a book; When searching part of the title may be supplied; How do I resolve this? Please let me know Thanks -PeriS *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: exporting to CSV with solrj
When you fire a query against Solr with the wt=csv the response coming from Solr is *already* in CSV, the CSVResponseWriter is responsible for translating SolrDocument instances into a CSV on the server side, son I don’t see any reason on using it by your self, Solr already do the heavy lifting for you. Regards, On Oct 31, 2014, at 10:44 AM, tedsolr tsm...@sciquest.com wrote: I am trying to invoke the CSVResponseWriter to create a CSV file of all stored fields. There are millions of documents so I need to write to the file iteratively. I saw a snippet of code online that claimed it could effectively remove the SorDocumentList wrapper and allow the docs to be retrieved in the actual format requested in the query. However, I get a null pointer from the CSVResponseWriter.write() method. SolrQuery qry = new SolrQuery(*:*); qry.setParam(wt, csv); // set other params SolrServer server = getSolrServer(); try { QueryResponse res = server.query(qry); CSVResponseWriter writer = new CSVResponseWriter(); Writer w = new StringWriter(); SolrQueryResponse solrResponse = new SolrQueryResponse(); solrResponse.setAllValues(res.getResponse()); try { SolrParams list = new MapSolrParams(new HashMapString, String()); writer.write(w, new LocalSolrQueryRequest(null, list), solrResponse); } catch (IOException e) { throw new RuntimeException(e); } System.out.print(w.toString()); } catch (SolrServerException e) { e.printStackTrace(); } NPE snippet: org.apache.solr.response.CSVWriter.writeResponse(CSVResponseWriter.java:281) org.apache.solr.response.CSVResponseWriter.write(CSVResponseWriter.java:56) Am I on the right track with the approach? I really don't want to roll my own document to CSV line convertor. Thanks! Solr 4.9 -- View this message in context: http://lucene.472066.n3.nabble.com/exporting-to-CSV-with-solrj-tp4166845.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR query - restrict access to user documents
I see you’re defining a default value for “rows” this could be overridden on the request, and requesting a lot of documents from solr can stress out your server/cluster, of course if the client in question has that many documents. if this is a fixed value and the clients can’t request more documents, then I’ll consider moving this into the invariants section ensuring that no matter what this value can’t be changed by the request. Some time ago I had a similar use case, we wanted to expose Solr to the clients and eventually we faced problems where some clients requested “all of his documents” in one request stressing out our cluster in the end we wrote a custom SearchComponent to set max values (instead of a fixed value specified on invariants) for the rows and start parameters (actually this component those a little more as we add some limitations to each type of client, defining some constrains as how many documents. i.e. data points can be requested, etc.). Hope it helps, On Oct 7, 2014, at 11:37 AM, Nitin Agarwal 2nitinagar...@gmail.com wrote: Hi, I have a question around SOLR query, I am trying to restrict access to SOLR data. We are running SOLR 4.7.1, and wish to expose the query capabilities to our customers for the data that belongs to them. Specifically /select, with default configuration is the only Request Handler that customers can access. requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str /lst /requestHandler The custom API that fronts SOLR, will inject appropriate restriction into the q param e.g. q=customerNumber:123 or append to q param q=customer query AND customerNumber:123, before sending the request to the /select handler. This works fine, however, I want to know if there is a way customer can override these restrictions? If so what can I do to prevent that? So far I have come across facet.mincount as one potential concern where by customer can see data that they should not, e.g. /select?q=customer query AND customerNumber:123facet=truefacet.field=customerNamerows=0*facet.mincount=0* will return those customer names as well that do not belong to customerNumber 123. Are there any other gotchas that I should know? Thanks for your time and help, Nitin Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com
Re: Best way to index wordpress blogs in solr
If you’re talking about a generic web crawl you could use something like Nutch [1] keep in mind that his a full web crawler and it does a pretty good job. I’ve been using it for over more than 2 years now and I’m very happy, although I don’t crawl just a couple of sites but a more wide spectrum (think a country web scale). But with Nutch you just have to configure a couple of options in an xml file and it will crawl the web and index the content into Solr. Regards, [1] http://nutch.apache.org On Oct 7, 2014, at 4:53 PM, Vishal Sharma vish...@grazitti.com wrote: Makes sense. I'll just dive in now. Thanks so much. *Vishal Sharma**TL, Grazitti Interactive*T: +1 650 641 1754 E: vish...@grazitti.com www.grazitti.com [image: Description: LinkedIn] http://www.linkedin.com/company/grazitti-interactive[image: Description: Twitter] https://twitter.com/grazitti[image: fbook] https://www.facebook.com/grazitti.interactive*dreamforce®*Oct 13-16, 2014 *Meet us at the Cloud Expo* Booth N2341 Moscone North, San Francisco Schedule a Meeting http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule | Follow us https://twitter.com/grazittiZakCalendar Dreamforce® Featured App https://appexchange.salesforce.com/listingDetail?listingId=a0N300B5UPKEA3 On Tue, Oct 7, 2014 at 1:44 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: I am pretty sure Swift is not Solr. That's why I was asking whether you were starting from scratch. As to the other items, please re-read my original response. Solr has an example reading in RSS feeds, you could probably use that. Or a generic XML using DataImportHandler's mapping. Or directly from database, again with DIH. Basically, it sounds totally doable. So, it's hard to advise anything specific beyond go, do it and wait for you to come back with a lot more specific issue once you get going. Most of the issues will be related to your schema and your WordPress configuration, so no abstract advice is available. Regards, Alex. On 7 October 2014 16:36, Vishal Sharma vish...@grazitti.com wrote: Hey Alex, Thanks for the prompt response. Here is what I am trying to solve: I am showing search results from content coming from 3 different places on a single site. And, I have done that by pumping all this content to Solr server running on single flat schema by using different APIs of these platforms. Now, I need to index blog posts written in word press also. I was wondering if there is any solution already availablw which can help me crawl and pump this posst to my running solr instance. Otherwise I might have to write few more scripts to do that. BTW, Is Swift using Solr on the backend? Because I thought its a paid enterprise solution. Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com
Re: Changed behavior in solr 4 ??
Don’t worry, the way Hoss explained its indeed the way I’ve know that works, but the example provided in the book pick my curiosity and hence the question in this thread. Regards, On Sep 30, 2014, at 5:59 PM, Timothy Potter thelabd...@gmail.com wrote: Indeed - Hoss is correct ... it's a problem with the example in the book ... my apologies for the confusion! On Tue, Sep 30, 2014 at 3:57 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Thanks for the response, yes the way you describe I know it works and is : how I get it to work but then what does mean the snippet of the : documentation I see on the documentation about overriding the default It means that there is implicitly a set of search components that have default behavior, and there is an implicit list of component *names* used by default by SearchHandler -- and if you override one of those implicit searchComponent instances by declaring your own with the same name, then it will be used by default in SerachHandler. a very concrete example of this is HighlightComponent -- if you have no HighlightComponent declared in your solrconfig.xml, then an implicit instance exists with the name highlight and SearchHandler by default includes that component. If you want to declare your own HighlightComponent instance with special initialization logic, you can either declare it with it's own unique name, and edit the components list on a SerachHandler declatarion to include that name, or you can just name it highlight and it will override the default instance -- this is in fact done in the example solrconfig.xml (grep for HighlightComponent) : components shipped with Solr? Even on the book Solr in Action in chapter : 7 listing 7.3 I saw something similar to what I wanted to do: : : searchComponent name=query class=solr.QueryComponent : lst name=invariants ... That appears to be a mistake in Solr in Action ... the QueryComponent class does nothing with it's init params (the nested XML inside the searchComponent declaration) so that syntax does nothing. -Hoss http://www.lucidworks.com/ Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com
Re: (auto)suggestions, but ony from a filtered set of documents
Perhaps instead of the suggester component you could use the EdgeNGramFilter and provide partial matches so you will me able to configure a custom request handler that will “suggest” terms of phrases for you. I’m using this approach to provide queries suggestions, of course I’m indexing the queries into a separated core. Greetings, On Sep 26, 2014, at 8:49 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: Either my intention is dumb (pls let me know ;)), or there is no answer to this problem. If so, I will have to index my sources into separate cores. But then the questions arise: a) how do I get suggestions from more than one core? Multiple suggest-requests, then merge? b) how doe I get (ranked) results from more than one core? In Lucene I was able to use a MultiIndexReader (one IndexReaders per index) -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Donnerstag, 25. September 2014 10:24 An: solr-user@lucene.apache.org Betreff: (auto)suggestions, but ony from a filtered set of documents What I'd like to do is http://localhost:8983/solr/solrpedia/suggest?q=atmqf=source:mysource Through qf (or however the parameter shall be called) I'd like to restrict the suggestions to documents which fit the given qf-query. I need this filter if (as posted in a previous thread) I intend to put different kind of data into one core/collection, cause suggestion shall be restrictable to one or many source(s) Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com
Re: Changed behavior in solr 4 ??
I haven’t used it before this, basically I found out about this in the Solr in Action book and guided by the comment about redefining the default components by defining a new searchComponent with the same name. Any how thanks for your reply! Regards, On Sep 25, 2014, at 8:01 AM, Jack Krupansky j...@basetechnology.com wrote: I am not aware of any such feature! That doesn't mean it doesn't exist, but I don't recall seeing it in the Solr source code. -- Jack Krupansky -Original Message- From: Jorge Luis Betancourt Gonzalez Sent: Wednesday, September 24, 2014 1:31 AM To: solr-user@lucene.apache.org Subject: Re: Changed behavior in solr 4 ?? Hi Jack: Thanks for the response, yes the way you describe I know it works and is how I get it to work but then what does mean the snippet of the documentation I see on the documentation about overriding the default components shipped with Solr? Even on the book Solr in Action in chapter 7 listing 7.3 I saw something similar to what I wanted to do: searchComponent name=query class=solr.QueryComponent lst name=invariants str name=rows25/str str name=dfcontent_field/str /lst lst name=defaults str name=q*:*/str str name=indenttrue/str str name=echoParamsexplicit/str /lst /searchComponent Because each default search component exists by default even if it’s not defined explicitly in the solrconfig.xml file, defining them explicitly as in the previous listing will replace the default configuration. The previous snippet is from the quoted book Solr in Action, I understand that in each SearchHandler I could define this parameters bu if defined in the searchComponent (as the book says) this configuration wouldn’t apply to all my request handlers? eliminating the need to replicate the same parameter in several parts of my solrconfig.xml (i.e all the request handlers)? Regards, On Sep 23, 2014, at 11:53 PM, Jack Krupansky j...@basetechnology.com wrote: You set the defaults on the search handler, not the search component. See solrconfig.xml: requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str /lst ... -- Jack Krupansky -Original Message- From: Jorge Luis Betancourt Gonzalez Sent: Tuesday, September 23, 2014 11:02 AM To: solr-user@lucene.apache.org Subject: Changed behavior in solr 4 ?? Hi: I’m trying to change the default configuration for the query component of a SearchHandler, basically I want to set a default value to the rows parameters and that this value be shared by all my SearchHandlers, as stated on the solrconfig.xml comments, this could be accomplished redeclaring the query search component, however this is not working on solr 4.9.0 which is the version I’m using, this is my configuration: searchComponent name=query class=solr.QueryComponent lst name=defaults int name=rows1/int /lst /searchComponent The relevant portion of the solrconfig.xml comment is: If you register a searchComponent to one of the standard names, will be used instead of the default.” so is this a new desired behavior?? although just for testing a redefined the components of the request handler to only use the query component and not to use all the default components, this is how it looks: requestHandler name=/select class=solr.SearchHandler” arr name=components strquery/str /arr /requestHandler Everything works ok but the the rows parameter is not used, although I’m not specifying the rows parameter on the URL. Regards,Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com
Re: Spellchecking and suggesting part numbers
I’ve done something similar to this using the the EdgeNGram not the spellchecker component, I don’t know if this is along with your requirements: The relevant portion of my fieldType config: filter class=solr.WordDelimiterFilterFactory” generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory maxGramSize=20 minGramSize=1”/ Basically use the WorDelimiterFilterFactory to divide the ABCD1234 into two tokens (or don’t depending on your requirement) and then use the EdgeNGramFilterFactory to provide partial matching on the field. On Sep 24, 2014, at 10:05 AM, Lochschmied, Alexander alexander.lochschm...@vishay.com wrote: Hello Solr Users, we are trying to get suggestions for part numbers using the spellchecker. Problem scenario: ABCD1234 // This is the search term ABCE1234 // This is what we get from spellchecker ABCD1244 // This is what we would like to get from spellchecker Characters towards the left of our part numbers are more relevant. The setup is: searchComponent name=spellcheck_part class=solr.SpellCheckComponent lst name=spellchecker str name=classnamesolr.IndexBasedSpellChecker/str str name=spellcheckIndexDir./spellchecker/str str name=fielddid_you_mean_part/str /lst /searchComponent requestHandler name=/spell_part class=solr.SearchHandler startup=lazy lst name=defaults str name=dfdid_you_mean_part/str str name=spellcheckon/str /lst arr name=last-components strspellcheck_part/str /arr /requestHandler fieldType name=did_you_mean_part class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\s]+ replacement=/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 side=front/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\s]+ replacement=/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 side=front/ /analyzer /fieldType Can we tweak the setup such that we should get more relevant part numbers? Thanks, Alexander Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com
Changed behavior in solr 4 ??
Hi: I’m trying to change the default configuration for the query component of a SearchHandler, basically I want to set a default value to the rows parameters and that this value be shared by all my SearchHandlers, as stated on the solrconfig.xml comments, this could be accomplished redeclaring the query search component, however this is not working on solr 4.9.0 which is the version I’m using, this is my configuration: searchComponent name=query class=solr.QueryComponent lst name=defaults int name=rows1/int /lst /searchComponent The relevant portion of the solrconfig.xml comment is: If you register a searchComponent to one of the standard names, will be used instead of the default.” so is this a new desired behavior?? although just for testing a redefined the components of the request handler to only use the query component and not to use all the default components, this is how it looks: requestHandler name=/select class=solr.SearchHandler” arr name=components strquery/str /arr /requestHandler Everything works ok but the the rows parameter is not used, although I’m not specifying the rows parameter on the URL. Regards,Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com
Re: Changed behavior in solr 4 ??
Hi Jack: Thanks for the response, yes the way you describe I know it works and is how I get it to work but then what does mean the snippet of the documentation I see on the documentation about overriding the default components shipped with Solr? Even on the book Solr in Action in chapter 7 listing 7.3 I saw something similar to what I wanted to do: searchComponent name=query class=solr.QueryComponent lst name=invariants str name=rows25/str str name=dfcontent_field/str /lst lst name=defaults str name=q*:*/str str name=indenttrue/str str name=echoParamsexplicit/str /lst /searchComponent Because each default search component exists by default even if it’s not defined explicitly in the solrconfig.xml file, defining them explicitly as in the previous listing will replace the default configuration. The previous snippet is from the quoted book Solr in Action, I understand that in each SearchHandler I could define this parameters bu if defined in the searchComponent (as the book says) this configuration wouldn’t apply to all my request handlers? eliminating the need to replicate the same parameter in several parts of my solrconfig.xml (i.e all the request handlers)? Regards, On Sep 23, 2014, at 11:53 PM, Jack Krupansky j...@basetechnology.com wrote: You set the defaults on the search handler, not the search component. See solrconfig.xml: requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str /lst ... -- Jack Krupansky -Original Message- From: Jorge Luis Betancourt Gonzalez Sent: Tuesday, September 23, 2014 11:02 AM To: solr-user@lucene.apache.org Subject: Changed behavior in solr 4 ?? Hi: I’m trying to change the default configuration for the query component of a SearchHandler, basically I want to set a default value to the rows parameters and that this value be shared by all my SearchHandlers, as stated on the solrconfig.xml comments, this could be accomplished redeclaring the query search component, however this is not working on solr 4.9.0 which is the version I’m using, this is my configuration: searchComponent name=query class=solr.QueryComponent lst name=defaults int name=rows1/int /lst /searchComponent The relevant portion of the solrconfig.xml comment is: If you register a searchComponent to one of the standard names, will be used instead of the default.” so is this a new desired behavior?? although just for testing a redefined the components of the request handler to only use the query component and not to use all the default components, this is how it looks: requestHandler name=/select class=solr.SearchHandler” arr name=components strquery/str /arr /requestHandler Everything works ok but the the rows parameter is not used, although I’m not specifying the rows parameter on the URL. Regards,Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com
Re: How to exclude a mimetype in tika?
Which crawler are you using? On Sep 18, 2014, at 10:14 AM, keeblerh keebl...@yahoo.com wrote: eShard wrote Good afternoon, I'm using solr 4.0 Final I need movies hidden in zip files that need to be excluded from the index. I can't filter movies on the crawler because then I would have to exclude all zip files. I was told I can have tika skip the movies. the details are escaping me at this point. How do I exclude a file in the tika configuration? I assume it's something I add in the update/extract handler but I'm not sure. Thanks, I am having the same issue. I need to exlcude some mime types from the zip files and using SOLR 4.8. Did you ever get an answer to this? THanks. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-exclude-a-mimetype-in-tika-tp4127168p4159676.html Sent from the Solr - User mailing list archive at Nabble.com. Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com
Re: Solr(j) API for manipulating the schema(.xml)?
Basically you could create a bunch of dynamic fields (according to your needs) so basically creating a dynamic field for each type of data (and several combinations) and then you can create a small wrapper around Solrj that will wrap the patterns defined on your schema.xml in a more understandable way. Like this you will be able to abstract the manipulation of the schema.xml file and only introduce it when is really needed i.e a new field type with new analyzers, etc. On Sep 18, 2014, at 3:16 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: as our framework so far only knows a few field types dynamic fields may be the way to go... And if there are new fieldtypes the new schema can be distributed through ZooKeeper -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Mittwoch, 17. September 2014 19:56 An: solr-user@lucene.apache.org Betreff: Re: Solr(j) API for manipulating the schema(.xml)? Right, you can create new cores over the rest api. As far as changing the schema, there's no good way to do that that I know of programmatically. In the SolrCloud world, you can upload the schema to ZooKeeper and have it automatically distributed to all the nodes though. Best, Erick On Wed, Sep 17, 2014 at 2:28 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: Is there an API to manipulate/consolidate the schema(.xml) of a Solr-core? Through SolrJ? Context: We already have a generic indexing/searching framework (based on lucene) where any component can act as a so called IndexDataPorvider. This provider delivers the field-types and also the entities to be (converted into documents and then) indexed. Each of these IndexProviders has ist own lucene index. So we kind of have the information for the Solr schema.xml. Hope the intention is clear. And yes the manipulation of the schema.xml is basically only needed when the field types change. Thats why I am looking for a way to consolidate the schema.xml (upon boot, initialization oft he IndexDataProviders ...). In 99,999% it won't change, But I'd like to keep the possibility of an IndexDataProvider to hand in its schema. Also, again driven by the dynamic nature of our framework, can I easily create new cores over Sorj or the Solr-REST API ? Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com
Re: How to implement multilingual word components fields schema?
In one of the talks by Trey Grainger (author of Solr in Action) it touches how on CareerBuilder are dealing with multilingual with payloads, its a little more of work but I think it would payoff. On Sep 8, 2014, at 7:58 AM, Jack Krupansky j...@basetechnology.com wrote: You also need to take a stance as to whether you wish to auto-detect the language at query time vs. have a UI selection of language vs. attempt to perform the same query for each available language and then determine which has the best relevancy. The latter two options are very sensitive to short queries. Keep in mind that auto-detection for indexing full documents is a different problem that auto-detection for very short queries. -- Jack Krupansky -Original Message- From: Ilia Sretenskii Sent: Sunday, September 7, 2014 10:33 PM To: solr-user@lucene.apache.org Subject: Re: How to implement multilingual word components fields schema? Thank you for the replies, guys! Using field-per-language approach for multilingual content is the last thing I would try since my actual task is to implement a search functionality which would implement relatively the same possibilities for every known world language. The closest references are those popular web search engines, they seem to serve worldwide users with their different languages and even cross-language queries as well. Thus, a field-per-language approach would be a sure waste of storage resources due to the high number of duplicates, since there are over 200 known languages. I really would like to keep single field for cross-language searchable text content, witout splitting it into specific language fields or specific language cores. So my current choice will be to stay with just the ICUTokenizer and ICUFoldingFilter as they are without any language specific stemmers/lemmatizers yet at all. Probably I will put the most popular languages stop words filters and stemmers into the same one searchable text field to give it a try and see if it works correctly in a stack. Does specific language related filters stacking work correctly in one field? Further development will most likely involve some advanced custom analyzers like the SimplePolyGlotStemmingTokenFilter to utilize the ICU generated ScriptAttribute. http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/100236 https://github.com/whateverdood/cross-lingual-search/blob/master/src/main/java/org/apache/lucene/sandbox/analysis/polyglot/SimplePolyGlotStemmingTokenFilter.java So I would like to know more about those academic papers on this issue of how best to deal with mixed language/mixed script queries and documents. Tom, could you please share them? Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com
Re: Strategies for effective prefix queries?
Perhaps what you’re trying to do could be addressed by using the EdgeNGramFilterFactory filter? For query suggestions I’m using a very similar approach, this is an extract of the configuration I’m using: tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory maxGramSize=“10 minGramSize=1/ Basically this allows you to get partial matches from any part of the string, let’s say the field get’s this content at index time: A brown fox”, this document will be matched by the query (“bro”) for instance. My personal recommendation is to use this in a separated field that get’s populated through a copyField, this way you could apply different boosts. Greetings, On Jul 16, 2014, at 2:00 PM, Hayden Muhl haydenm...@gmail.com wrote: A copy field does not address my problem, and this has nothing to do with stored fields. This is a query parsing problem, not an indexing problem. Here's the use case. If someone has a username like bob-smith, I would like it to match prefixes of bo and sm. I tokenize the username into the tokens bob and smith. Everything is fine so far. If someone enters bo sm as a search string, I would like bob-smith to be one of the results. The query to do this is straight forward, username:bo* username:sm*. Here's the problem. In order to construct that query, I have to tokenize the search string bo sm **on the client**. I don't want to reimplement tokenization on the client. Is there any way to give Solr the string bo sm, have Solr do the tokenization, then treat each token like a prefix? On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: So copyField it to another and apply alternative processing there. Use eDismax to search both. No need to store the copied field, just index it. Regards, Alex On 16/07/2014 2:46 am, Hayden Muhl haydenm...@gmail.com wrote: Both fields? There is only one field here: username. On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Search against both fields (one split, one not split)? Keep original and tokenized form? I am doing something similar with class name autocompletes here: https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24 Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl haydenm...@gmail.com wrote: I'm working on using Solr for autocompleting usernames. I'm running into a problem with the wildcard queries (e.g. username:al*). We are tokenizing usernames so that a username like solr-user will be tokenized into solr and user, and will match both sol and use prefixes. The problem is when we get solr-u as a prefix, I'm having to split that up on the client side before I construct a query username:solr* username:u*. I'm basically using a regex as a poor man's tokenizer. Is there a better way to approach this? Is there a way to tell Solr to tokenize a string and use the parts as prefixes? - Hayden VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu
Solr 4.x and master-slave schema
Hi all: We have a small installation of Solr 3.6 in our hands, right now we have 3 physical servers (1 master and 2 slaves) the ingestion process it’s done in the master which replicates by solr internal mechanism into the slaves, which handles all the queries. We are trying to update to Solr 4.x, eventually we would like to migrate into SolrCloud, my question essentially is if we migrate our Solr 3.6 nodes into Solr 4.9 and keep the same master-slave schema, how hard it would be to migrate afterwards to SorlCloud. Greetings,VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu
Get position of first occurrence in search result
I’m using Solr for an analytic use case, one of the requirements is basically given a search query get the position of the first hit. I’m indexing web pages, so given a search criteria the client want’s to know the position (first occurrence) of his webpage in the result set (if it appears at all). Is any way of getting this position without iterating and manually checking the solr response? Greetings, VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu
Re: Get position of first occurrence in search result
Basically given a few search terms (query) the idea is to know given one or more terms in which position your website is located for those specific terms. On Jun 24, 2014, at 12:12 AM, Aman Tandon amantandon...@gmail.com wrote: What kind of search criteria, could you please explain With Regards Aman Tandon On Tue, Jun 24, 2014 at 4:30 AM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: I’m using Solr for an analytic use case, one of the requirements is basically given a search query get the position of the first hit. I’m indexing web pages, so given a search criteria the client want’s to know the position (first occurrence) of his webpage in the result set (if it appears at all). Is any way of getting this position without iterating and manually checking the solr response? Greetings, VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu
Re: Get position of first occurrence in search result
Yes, but I’m looking for the position of the url field of interest in the response of solr. Solr matches the terms against the collection of documents and returns sorted list by score, what I’m trying to do is get the position of the a specific id in this sorted response. The response could be something like position: 5, or position 500. To do this manually suppose the response consists of a very large amount of documents (webpages) in this case I would need to iterate over the complete response to find the position, which in a worst case scenario could be in the last page for instance. For this particular use case I’m not so interested in the URL field per se but more on the position a certain url has in the full solr response. On Jun 24, 2014, at 12:31 AM, Walter Underwood wun...@wunderwood.org wrote: Solr is designed to do exactly this very, very fast. So there isn't a faster way to do it. But you only need to fetch the URL field. You can ignore everything else. wunder On Jun 23, 2014, at 9:32 PM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Basically given a few search terms (query) the idea is to know given one or more terms in which position your website is located for those specific terms. On Jun 24, 2014, at 12:12 AM, Aman Tandon amantandon...@gmail.com wrote: What kind of search criteria, could you please explain With Regards Aman Tandon On Tue, Jun 24, 2014 at 4:30 AM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: I’m using Solr for an analytic use case, one of the requirements is basically given a search query get the position of the first hit. I’m indexing web pages, so given a search criteria the client want’s to know the position (first occurrence) of his webpage in the result set (if it appears at all). Is any way of getting this position without iterating and manually checking the solr response? Greetings, VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu
Re: Get position of first occurrence in search result
Basically this is for analytical purposes, essentially we want to help people (which sites we’ve indexed in our app) to find out for which particular terms (in theory related with their domain) they are bad positioned in our index. Initially we’re starting with this basic “position per term” but the idea is to elaborate further in this direction. This logic por position finding could be abstracted effectively in a plugin inside Solr? I guess it would be more efficient to iterate (or fire the 2 queries) from within solr itself than in our app (written in PHP, so not so fast for some things) speeding up things? Regards, On Jun 24, 2014, at 1:42 AM, Aman Tandon amantandon...@gmail.com wrote: Jorge, i don't think that solr provide this functionality, you have to iterate and solr is very fast in this, you can create a script for that which search for pattern(term) and parse(request) the records until get the record of that desired url, i don't thing 1/3 seconds time to find out is more. As per the search result analysis, there are very few people who request for the second page for their query, otherwise mostly leave the search or modify query string. So i better suggest you that the if the website has the appropriate and good data it should come on first page, so its better to come on first page rather than finding the position. With Regards Aman Tandon On Tue, Jun 24, 2014 at 10:35 AM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Yes, but I’m looking for the position of the url field of interest in the response of solr. Solr matches the terms against the collection of documents and returns sorted list by score, what I’m trying to do is get the position of the a specific id in this sorted response. The response could be something like position: 5, or position 500. To do this manually suppose the response consists of a very large amount of documents (webpages) in this case I would need to iterate over the complete response to find the position, which in a worst case scenario could be in the last page for instance. For this particular use case I’m not so interested in the URL field per se but more on the position a certain url has in the full solr response. On Jun 24, 2014, at 12:31 AM, Walter Underwood wun...@wunderwood.org wrote: Solr is designed to do exactly this very, very fast. So there isn't a faster way to do it. But you only need to fetch the URL field. You can ignore everything else. wunder On Jun 23, 2014, at 9:32 PM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Basically given a few search terms (query) the idea is to know given one or more terms in which position your website is located for those specific terms. On Jun 24, 2014, at 12:12 AM, Aman Tandon amantandon...@gmail.com wrote: What kind of search criteria, could you please explain With Regards Aman Tandon On Tue, Jun 24, 2014 at 4:30 AM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: I’m using Solr for an analytic use case, one of the requirements is basically given a search query get the position of the first hit. I’m indexing web pages, so given a search criteria the client want’s to know the position (first occurrence) of his webpage in the result set (if it appears at all). Is any way of getting this position without iterating and manually checking the solr response? Greetings, VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu
Re: Customizing Solr; Where to draw the line?
I’ve certainly go for the 2nd option. Depending of what you need you won’t need to modify Solr itself but extend it using different plugins for what you need. You’ll need to write different components depending on your specific requirements. I definitely recommend the talks from Trey Grainger, from CareerBuilder. I remember seeing in some of the talks they have A/B testing built into Solr, and a lot of other “crazy” things, so it would be a good starting point, and it will provide a look on what you could accomplish by extending Solr. Of course you’ll need to update your source between big releases of Solr, and perhaps between some minor ones, but this way you don’t need to worry about the latency or maintain a new search layer between the client and Solr. I hope it helps, On Jun 8, 2014, at 10:38 PM, Phanindra R phani...@gmail.com wrote: Hi, We have decided to migrate from Lucene 3.x to latest Solr. A lot of architectural discussions are going on. There are two possible approaches. Please note that our customer-facing app (or any client) and Search are hosted on different machines. *1) Have a clean architecture* - Solr takes care of customized search only. - We certainly have to override some filtering, scoring,etc. - There will be an intermediary search-app that - receives queries - does a/b testing assignments, and other non-search stuff. - does query expansion / rewriting (to avoid every Solr shard doing that) - transforms query into Solr syntax and uses Solr's http API to consume it. - returns the response to customer-facing app or whatever the client is. The problem with this approach is the additional layer and the latency between search-app and solr. The client of search has to make an API call, across the network, to the intermediary search-app which in turns makes another Http API call to Solr. *2) Customize Solr to the full extent* - Do all the crazy stuff within Solr. - We can literally create a new url and register a handler class to process that. With some limitations, we should be able to do almost anything. The benefit of this approach is that it obviates the additional layer and the latency. However, I see a lot of long-term problems like hard to upgrade Solr's version, Dev flexibility (usage of Spring, Hib, etc.). How about a distributed search? Where do above approaches stand? I understand that this is a subjective question. It'd be helpful if you could share your thoughts and experiences. Thanks. VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu
Percolator feature
Is there some work around in Solr ecosystem to get something similar to the percolator feature offered by elastic search? Greetings!VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu
Re: Writing a customize updateRequestHandler
In the book Apache Solr Beginner’s Guide there is a section dedicated to write new Solr plugins, perhaps it would be a good place to start, also in the wiki there is a page about this, but the it’s a light introduction. I’ve found that a very good starting point it’s just browse throw the code of some standard components similar to the one you’re trying to customize. On Feb 3, 2014, at 9:00 AM, neerajp neeraj_star2...@yahoo.com wrote: Hi, I want to write a custom updateRequestHandler. Can you pl.s guide me the steps I need to perform for that ? -- View this message in context: http://lucene.472066.n3.nabble.com/Writing-a-customize-updateRequestHandler-tp4115059.html Sent from the Solr - User mailing list archive at Nabble.com. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Solr Nutch
Q1: Nutch doesn’t only handle the parse of HTML files, it also use hadoop to achieve large-scale crawling using multiple nodes, it fetch the content of the HTML file, and yes it also parse its content. Q2: In our case we use sold to crawl some website, store the content in one “main” solr core. We also have a web app with the typical “search box” we use a separated core to store the queries made by our users. Q3: Not currently using SolrCloud so I’m going to let this one pass to a more experienced fellow. On Jan 28, 2014, at 11:36 AM, rashmi maheshwari maheshwari.ras...@gmail.com wrote: Hi, Question1 -- When Solr could parse html, documents like doc, excel pdf etc, why do we need nutch to parse html files? what is different? Questions 2: When do we use multiple core in solar? any practical business case when we need multiple cores? Question 3: When do we go for cloud? What is meaning of implementing solr cloud? -- Rashmi Be the change that you want to see in this world! www.minnal.zor.org disha.resolve.at www.artofliving.org III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: PHP + Solr
I’ve some experience using Solarium and have been great so far. In particular we use the NelmioSolariumBundle to integrate with Symfony2. Greetings! On Jan 28, 2014, at 1:54 PM, Felipe Dantas de Souza Paiva cad_fpa...@uolinc.com wrote: Hi Folks, I would like to know what is the best way to integrate PHP and Apache Solr. Until now I've found two options: 1) http://www.php.net/manual/en/intro.solr.php 2) http://www.solarium-project.org/ What do you guys say? Cheers, Felipe AVISO: A informaç?o contida neste e-mail, bem como em qualquer de seus anexos, é CONFIDENCIAL e destinada ao uso exclusivo do(s) destinat?rio(s) acima referido(s), podendo conter informaç?es sigilosas e/ou legalmente protegidas. Caso você n?o seja o destinat?rio desta mensagem, informamos que qualquer divulgaç?o, distribuiç?o ou c?pia deste e-mail e/ou de qualquer de seus anexos é absolutamente proibida. Solicitamos que o remetente seja comunicado imediatamente, respondendo esta mensagem, e que o original desta mensagem e de seus anexos, bem como toda e qualquer c?pia e/ou impress?o realizada a partir destes, sejam permanentemente apagados e/ou destru?dos. Informaç?es adicionais sobre nossa empresa podem ser obtidas no site http://sobre.uol.com.br/. NOTICE: The information contained in this e-mail and any attachments thereto is CONFIDENTIAL and is intended only for use by the recipient named herein and may contain legally privileged and/or secret information. If you are not the e-mail´s intended recipient, you are hereby notified that any dissemination, distribution or copy of this e-mail, and/or any attachments thereto, is strictly prohibited. Please immediately notify the sender replying to the above mentioned e-mail address, and permanently delete and/or destroy the original and any copy of this e-mail and/or its attachments, as well as any printout thereof. Additional information about our company may be obtained through the site http://www.uol.com.br/ir/. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Solr server requirements for 100+ million documents
Previously in the list a spreadsheet has been mentioned, taking into account that you already have documents in an index you could extract the needed information from your index and feed it into the spreadsheet and it probably will give you a rough approximated of the hardware you’ll bee needing. Also if I’m not mistaken no SolrCloud approximation is provided by this “tool”. Greetings! On Jan 28, 2014, at 11:02 PM, Susheel Kumar susheel.ku...@thedigitalgroup.net wrote: Thanks, Jack. That helps. -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Tuesday, January 28, 2014 8:01 PM To: solr-user@lucene.apache.org Subject: Re: Solr server requirements for 100+ million documents Lucene and Solr work best if the full index can be cached in OS memory. Sure, Lucene/Solr does work properly once the index no longer fits, but performance will drop off. I would say that you could fit 100 million moderate-size documents on a single Solr server - provided that you give the OS enough RAM for the full Lucene index. That said, if you want to configure a SolrCloud cluster with shards, you can use more modest, commodity servers with less RAM, provided each server still fits it's fraction of the total Lucene index in that server's OS memory (file cache.) You may also need to add replicas for each shard to accommodate query load - proof-of-concept testing is needed to verify that. It is worth noting that sharding can improve total query performance since each node only searches a fraction of the total data and those searches are done in parallel (since they are on different machines.) -- Jack Krupansky -Original Message- From: Susheel Kumar Sent: Sunday, January 26, 2014 10:54 AM To: solr-user@lucene.apache.org Subject: RE: Solr server requirements for 100+ million documents Thank you Erick for your valuable inputs. Yes, we have to re-index data again again. I'll look into possibility of tuning db access. On SolrJ and automating the indexing (incremental as well as one time) I want to get your opinion on below two points. We will be indexing separate sets of tables with similar data structure - Should we use SolrJ and write Java programs that can be scheduled to trigger indexing on demand/schedule based. - Is using SolrJ a better idea even for searching than using SolrNet? As our frontend is in .Net so we started using SolrNet but I am afraid down the road when we scale/support SolrClod using SolrJ is better? Thanks Susheel -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, January 26, 2014 8:37 AM To: solr-user@lucene.apache.org Subject: Re: Solr server requirements for 100+ million documents Dumping the raw data would probably be a good idea. I guarantee you'll be re-indexing the data several times as you change the schema to accommodate different requirements... But it may also be worth spending some time figuring out why the DB access is slow. Sometimes one can tune that. If you go the SolrJ route, you also have the possibility of setting up N clients to work simultaneously, sometimes that'll help. FWIW, Erick On Sat, Jan 25, 2014 at 11:06 PM, Susheel Kumar susheel.ku...@thedigitalgroup.net wrote: Hi Kranti, Attach are the solrconfig schema xml for review. I did run indexing with just few fields (5-6 fields) in schema.xml keeping the same db config but Indexing almost still taking similar time (average 1 million records 1 hr) which confirms that the bottleneck is in the data acquisition which in our case is oracle database. I am thinking to not use dataimporthandler / jdbc to get data from Oracle but to rather dump data somehow from oracle using SQL loader and then index it. Any thoughts? Thnx -Original Message- From: Kranti Parisa [mailto:kranti.par...@gmail.com] Sent: Saturday, January 25, 2014 12:08 AM To: solr-user@lucene.apache.org Subject: Re: Solr server requirements for 100+ million documents can you post the complete solrconfig.xml file and schema.xml files to review all of your settings that would impact your indexing performance. Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Sat, Jan 25, 2014 at 12:56 AM, Susheel Kumar susheel.ku...@thedigitalgroup.net wrote: Thanks, Svante. Your indexing speed using db seems to really fast. Can you please provide some more detail on how you are indexing db records. Is it thru DataImportHandler? And what database? Is that local db? We are indexing around 70 fields (60 multivalued) but data is not populated always in all fields. The average size of document is in 5-10 kbs. -Original Message- From: saka.csi...@gmail.com [mailto:saka.csi...@gmail.com] On Behalf Of svante karlsson Sent: Friday, January 24, 2014 5:05 PM To: solr-user@lucene.apache.org Subject: Re: Solr
Re: Implementing an alerting feature
I believe that you are looking for something similar to the percolator feature present in elasticsearch. I remember something about a solar implementation being discussed here some time ago. Anyone knows if there have been any progress in this area? On Jan 27, 2014, at 8:18 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Charlie; Is there any written documentation that explains your library? Thanks; Furkan KAMACI 2014-01-27 Charlie Hull char...@flax.co.uk On 27/01/2014 08:50, elmerfudd wrote: I want to implement an alert service in my solr system. In the FAST ESP system the service is called Real Time Alerting. The service I'm looking for is: - a document is fed to solr. - without the document indexed , a set of queries run on the document - if the document answers a query - an alert will be sent in near Real-Time. You might want to take a look at Luwak, a library we built recently for running lots of stored queries in an efficient manner. We use this for media monitoring applications. https://github.com/flaxsearch/luwak Cheers Charlie -- View this message in context: http://lucene.472066.n3. nabble.com/Implementing-an-alerting-feature-tp4113666.html Sent from the Solr - User mailing list archive at Nabble.com. -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Solr Related Search Suggestions
If I’m not remembering incorrectly Trey Grainger in one of his talks explained a few techniques that could be of use. If the equivalency is not dynamically you could just use synonyms. Otherwise some kind of offline processing should be used to compute the similarity between your queries (given that very little or none textual similarity it’s present in your queries). On Jan 27, 2014, at 4:29 AM, kumar pavan2...@gmail.com wrote: What is the best way to implement related search suggestions. For example : If the user is looking for marriage halls i need to show results like catering services, photography, wedding cards, invitation cards, music organisers. Thanks Regards, kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Related-Search-Suggestions-tp4113672.html Sent from the Solr - User mailing list archive at Nabble.com. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Unit testing custom update request processor
Happy new year! I’ve developed some custom update request processors to accomplish some custom logic needed in some user cases. I’m trying to write test for this processor, but I’d like to test in a very similar way of how the built in processors are tested in the solr source code. Is there any advice on how accomplish this or some experience that someone more experienced could share? Greetings! III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: ANNOUNCE: Apache Solr Reference Guide 4.6
Is it possible to export the doc into markdown? - Mensaje original - De: Chris Hostetter hossman_luc...@fucit.org Para: solr-user@lucene.apache.org Enviados: Lunes, 9 de Diciembre 2013 14:00:34 Asunto: Re: ANNOUNCE: Apache Solr Reference Guide 4.6 : Can we please give some thought to producing these manuals in ebook formats? People have given it thought, but it's not as simple as just snapping our fingers and making it happen. If you would like to contibute to the effort of figuring out the how/where/what to make this happening, there is an existing jira for dicussing it. https://issues.apache.org/jira/browse/SOLR-5467 -Hoss http://www.lucidworks.com/ III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
How to boost documents with all the query terms
Hi: I'm using solr 3.6 with dismax query parser, I've found that docs that doesn't has all the query terms get ranked above other that contains all the terms in the search query. Using debugQuery I could see that the most part of the score in this cases come from the coord(q,d) factor. Is there any way I could boost the documents that contain all the search query terms? Greetings! III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Introducing Luwak for high-performance stored Lucene queries
+1 on this. - Mensaje original - De: Otis Gospodnetic otis.gospodne...@gmail.com Para: solr-user@lucene.apache.org Enviados: Viernes, 6 de Diciembre 2013 9:35:25 Asunto: Re: Introducing Luwak for high-performance stored Lucene queries Hi Charlie, Very nice - thanks! I'd love to see a side-by-side comparison with ES percolator. got something like that in your blog topic queue? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Fri, Dec 6, 2013 at 9:29 AM, Charlie Hull char...@flax.co.uk wrote: Hi all, We've now released the library we mentioned in our presentation at Lucene Revolution: https://github.com/flaxsearch/luwak You can use this to apply tens of thousands of stored Lucene queries to an incoming document in a second or so on relatively modest hardware. We use it for media monitoring applications but it could equally be useful for categorisation, classification etc. It's currently based on a fork of Lucene (details supplied) but hopefully it'll work with release versions soon. Feedback is very welcome! Cheers Charlie -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: solr as a service for multiple projects in the same environment
I think that one experience in this area could by provided by Tray Grainger, author of Solr in Action, I believe that some of his work on careerbuilder involve the creation of something (somehow) similar to what you're trying to accomplish. I must say that I'm also interested in this topic, but haven't had the time to really do anything about this. - Mensaje original - De: adfel70 adfe...@gmail.com Para: solr-user@lucene.apache.org Enviados: Domingo, 1 de Diciembre 2013 2:41:00 Asunto: Re: solr as a service for multiple projects in the same environment The risk is if you buy mistake mess up a cluster while doing maintenance on one of the systems, you can affect the other system. Its a pretty amorfic risk. Aside from having multiple systems share the same hardware resources, I don't see any other real risk. Are your collections share the same topology in terms of shards and replicas? Do you manually configure the nodes on which each collection is created so that you'll still have some level of seperation between the systems? michael.boom wrote Hi, There's nothing unusual in what you are trying to do, this scenario is very common. To answer your questions: 1. as I understand I can separate the configs of each collection in zookeeper. is it correct? Yes, that's correct. You'll have to upload your configs to ZK and use the CollectionAPI to create your collections. 2.are there any solr operations that can be performed on collection A and somehow affect collection B? No, I can't think of any cross-collection operation. Here you can find a list of collection related operations: https://cwiki.apache.org/confluence/display/solr/Collections+API 3. is the solr cache separated for each collection? Yes, separate and configurable in solrconfig.xml for each collection. 4. I assume that I'll encounter a problem with the os cache, when the different indices will compete on the same memory, right? how severe is this issue? Hardware can be a bottleneck. If all your collection will face the same load you should try to give solr a RAM amount equal to the index size (all indexes) 5. any other advice on building such an architecture? does the maintenance overhead of maintaining multiple clusters in production really overwhelm the problems and risks of using the same cluster for multiple systems? I was in the same situation as you, and putting everything in multiple collections in just one cluster made sense for me : it's easier to manage and has no obvious downside. As for risks of using the same cluster for multiple systems they are pretty much the same in both scenarios. Only that with multiple clusters you'll have much more machines to manage. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-as-a-service-for-multiple-projects-in-the-same-environment-tp4103523p4104206.html Sent from the Solr - User mailing list archive at Nabble.com. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Client-side proxy for Solr 4.5.0
Perhaps what you want is a transparent proxy? You could use nginx, squid, varnish, etc. W've been evaluating varnish as a posibility to run in front of our solr server and take advantage of the HTTP caching that varnish does so well. Greetings! - Mensaje original - De: Markus Jelsma markus.jel...@openindex.io Para: solr-user@lucene.apache.org Enviados: Martes, 26 de Noviembre 2013 13:53:31 Asunto: RE: Client-side proxy for Solr 4.5.0 I don't think you mean client-side proxy. You need a server side layer such as a normal web application or good proxy. We use Nginx, it is very fast and very feature rich. Its config scripting is usually enough to restrict access and limit input parameters. We also use Nginx's embedded Perl and Lua scripting besides its config scripting to implement more difficult logic. -Original message- From:Reyes, Mark mark.re...@bpiedu.com Sent: Tuesday 26th November 2013 19:27 To: solr-user@lucene.apache.org Subject: Client-side proxy for Solr 4.5.0 Are there any GOOD client-side solutions to proxy a Solr 4.5.0 instance so that the end-user can see their queries w/o being able to directly access :8983? Applications/frameworks used: - Solr 4.5.0 - AJAX Solr (javascript library) Thank you, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Solr logs encoding to UTF8
Hi everybody: Is there any way of forcing an UTF-8 conversion on the queries that are logged into the log? I've deployed solr in tomcat7. The file appears to be an UTF-8 file but I'm seeing this in the logs: INFO: [] webapp=/solr path=/select params={fl=*,scorestart=0q=disñemos+el+mundohl.simple.pre=bhl.simple.post=/bhl.fl=title,content,url,description,keywordswt=jsonhl=truerows=20} hits=48865 status=0 QTime=155. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Strange behavior of gap fragmenter on highlighting
I'm seeing a rare behavior of the gap fragmenter on solr 3.6. Right now this is my configuration for the gap fragmenter: fragmenter name=gap default=true class=solr.highlight.GapFragmenter lst name=defaults int name=hl.fragsize150/int /lst /fragmenter This is the basic configuration, just tweaked the fragsize parameter to get shorter fragments. The thing is that for 1 particular PDF document in my results I get a really long snippet, way over 150 characters. This get a little more odd, if I change the 150 value for 100 the snippet for the same document it's normal ~ 100 characters. The type of the field being highlighted is this: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SnowballPorterFilterFactory languange=Spanish/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 types=characters.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Any ideas about what's happening?? Or how could I debug what is really going on?? Greetings! III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Auto Suggest - Time decay
Are you using the suggester component? or a separated core? I've used a separated core to store suggestions and order this suggestions (queries performed on the frontend) using a time decay function, and it works great for me. Regards, - Mensaje original - De: SolrLover bbar...@gmail.com Para: solr-user@lucene.apache.org Enviados: Martes, 1 de Octubre 2013 12:12:13 Asunto: Auto Suggest - Time decay I am trying to implement an auto suggest based on time decay function. I have a separate index just to store auto suggest keywords. I would be calculating the frequency over time rather than just calculating just based on frequency alone. I am thinking of using a database to perform the calculation and update the SOLR index with the boost calculated based on time decay function. I am not sure if there is a better way to do this... I need to boost the terms based on the frequency over time, Ex: when someone searches for 'apple' 1 times during a iphone launch (one particular day) shouldn't really make apple come up in the auto suggestion always when someone types in the keyword 'a' rather it should lose its popularity exponentially.. Anyone has any suggestions? -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965.html Sent from the Solr - User mailing list archive at Nabble.com. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Auto Suggest - Time decay
For that core just use a boost factor as explained on [1]: You could use a query like this to see (before make any change) how your suggestions will be retrieved, in this case a query for goog has been made, and recent documents will be boosted (an extra bonus will be given for the newer documents). http://localhost:8983/solr/select?q={!boost b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}goog If this is enough for you you could poot the boost parameter in your request handler and make it even simpler so any query againsta this particular request handler will be automatically boosted by date. PS: You could tweak the above formula used in the boost parameter for a more suitable to your needs. - Mensaje original - De: SolrLover bbar...@gmail.com Para: solr-user@lucene.apache.org Enviados: Martes, 1 de Octubre 2013 12:19:51 Asunto: Re: Auto Suggest - Time decay I am using a totally separate core for storing the auto suggest keywords. Would you be able to send me some more details on your implementation? -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965p4092969.html Sent from the Solr - User mailing list archive at Nabble.com. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Auto Suggest - Time decay
Sorry, I forgot the link: [1] - http://wiki.apache.org/solr/SolrRelevancyFAQ - Mensaje original - De: Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu Para: solr-user@lucene.apache.org Enviados: Martes, 1 de Octubre 2013 13:34:03 Asunto: Re: Auto Suggest - Time decay For that core just use a boost factor as explained on [1]: You could use a query like this to see (before make any change) how your suggestions will be retrieved, in this case a query for goog has been made, and recent documents will be boosted (an extra bonus will be given for the newer documents). http://localhost:8983/solr/select?q={!boost b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}goog If this is enough for you you could poot the boost parameter in your request handler and make it even simpler so any query againsta this particular request handler will be automatically boosted by date. PS: You could tweak the above formula used in the boost parameter for a more suitable to your needs. - Mensaje original - De: SolrLover bbar...@gmail.com Para: solr-user@lucene.apache.org Enviados: Martes, 1 de Octubre 2013 12:19:51 Asunto: Re: Auto Suggest - Time decay I am using a totally separate core for storing the auto suggest keywords. Would you be able to send me some more details on your implementation? -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965p4092969.html Sent from the Solr - User mailing list archive at Nabble.com. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Implementing Solr Suggester for Autocomplete (multiple columns)
I forgot to mention you could check the boost section on the configuration file of the core to see how your suggestions will be ranked, basically the boost factor for each field allows you to decide which suggestion do you like to come first, perhaps in your app you could keep track of how much a suggestion given to a user is actually used as the query and boost this suggestions as is more likely to become a query for the user; thinking a little ahead this could improve your user experience and additionally low the load on your server, because if a suggestion given to a high number of users become a query, this query should already be in the cache. This are just thoughts but I hope could be useful to you. Regards, - Mensaje original - De: Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu Para: solr-user@lucene.apache.org Enviados: Viernes, 27 de Septiembre 2013 19:44:28 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) Actually I don't use that field, it could be used to do some form of basic collaborative filtering, so you could use a high value for items in your collection that you want to come first, but in my case this was not a requirement and I don't use it at all. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Viernes, 27 de Septiembre 2013 16:19:40 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I am not sure about the value to use for the option popularity. Is there a method or do you just go with some arbitrary number? On Thursday, September 26, 2013, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Great!! I haven't see your message yet, perhaps you could create a PR to that Github repository, son it will be in sync with current versions of Solr. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Jueves, 26 de Septiembre 2013 9:10:49 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) solved. On Thu, Sep 26, 2013 at 1:50 PM, JMill apprentice...@googlemail.com wrote: I managed to get rid of the query error by playing jquery file in the velocity folder and adding line: script type=text/javascript src=#{url_for_solr}/admin/file?file=/velocity/jquery.min.jscontentType=text/javascript/script. That has not solved the issues the console is showing a new error - [13:42:55.181] TypeError: $.browser is undefined @ http://localhost:8983/solr/ac/admin/file?file=/velocity/jquery.autocomplete.jscontentType=text/javascript:90 . Any ideas? On Thu, Sep 26, 2013 at 1:12 PM, JMill apprentice...@googlemail.com wrote: Do you know the directory the #{url_root} in script type=text/javascript src=#{url_root}/js/lib/ jquery-1.7.2.min.js/script points too? and same for #{url_for_solr} script type=text/javascript src=#{url_for_solr}/js/lib/jquery-1.7.2.min.js/script On Wed, Sep 25, 2013 at 7:33 PM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Try quering the core where the data has been imported, something like: http://localhost:8983/solr/suggestions/select?q=uc In the previous URL suggestions is the name I give to the core, so this should change, if you get results, then the problem could be the jquery dependency. I don't remember doing any change, as far as I know that js file is bundled with solr (at leat in 3.x) version perhaps you could change it the correct jquery version on solr 4.4, if you go into the admin panel (in solr 3.6): http://localhost:8983/solr/admin/schema.jsp And inspect the loaded code, the required file (jquery-1.4.2.min.js) gets loaded in solr 4.4 it should load a similar file, but perhaps a more recent version. Perhaps you could change that part to something like: script type=text/javascript src=#{url_root}/js/lib/jquery-1.7.2.min.js/script Which is used at least on a solr 4.1 that I have laying aroud here somewhere. In any case you can test the suggestions using the URL that I suggest on the top of this mail, in that case you should be able to see the possible results, of course in a less fancy way. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 13:59:32 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) Could it be the jquery library that is the problem? I opened up solr-home/ac/conf/velocity/head.vm with an editor and I see a reference to the jquery library but I can't seem to find the directory referenced, line: script type=text/javascript src=#{url_for_solr}/admin/jquery-1.4.3.min.js. Do you know where #{url_for_solr} points to? On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez III Escuela Internacional de
Re: Implementing Solr Suggester for Autocomplete (multiple columns)
Actually I don't use that field, it could be used to do some form of basic collaborative filtering, so you could use a high value for items in your collection that you want to come first, but in my case this was not a requirement and I don't use it at all. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Viernes, 27 de Septiembre 2013 16:19:40 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I am not sure about the value to use for the option popularity. Is there a method or do you just go with some arbitrary number? On Thursday, September 26, 2013, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Great!! I haven't see your message yet, perhaps you could create a PR to that Github repository, son it will be in sync with current versions of Solr. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Jueves, 26 de Septiembre 2013 9:10:49 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) solved. On Thu, Sep 26, 2013 at 1:50 PM, JMill apprentice...@googlemail.com wrote: I managed to get rid of the query error by playing jquery file in the velocity folder and adding line: script type=text/javascript src=#{url_for_solr}/admin/file?file=/velocity/jquery.min.jscontentType=text/javascript/script. That has not solved the issues the console is showing a new error - [13:42:55.181] TypeError: $.browser is undefined @ http://localhost:8983/solr/ac/admin/file?file=/velocity/jquery.autocomplete.jscontentType=text/javascript:90 . Any ideas? On Thu, Sep 26, 2013 at 1:12 PM, JMill apprentice...@googlemail.com wrote: Do you know the directory the #{url_root} in script type=text/javascript src=#{url_root}/js/lib/ jquery-1.7.2.min.js/script points too? and same for #{url_for_solr} script type=text/javascript src=#{url_for_solr}/js/lib/jquery-1.7.2.min.js/script On Wed, Sep 25, 2013 at 7:33 PM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Try quering the core where the data has been imported, something like: http://localhost:8983/solr/suggestions/select?q=uc In the previous URL suggestions is the name I give to the core, so this should change, if you get results, then the problem could be the jquery dependency. I don't remember doing any change, as far as I know that js file is bundled with solr (at leat in 3.x) version perhaps you could change it the correct jquery version on solr 4.4, if you go into the admin panel (in solr 3.6): http://localhost:8983/solr/admin/schema.jsp And inspect the loaded code, the required file (jquery-1.4.2.min.js) gets loaded in solr 4.4 it should load a similar file, but perhaps a more recent version. Perhaps you could change that part to something like: script type=text/javascript src=#{url_root}/js/lib/jquery-1.7.2.min.js/script Which is used at least on a solr 4.1 that I have laying aroud here somewhere. In any case you can test the suggestions using the URL that I suggest on the top of this mail, in that case you should be able to see the possible results, of course in a less fancy way. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 13:59:32 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) Could it be the jquery library that is the problem? I opened up solr-home/ac/conf/velocity/head.vm with an editor and I see a reference to the jquery library but I can't seem to find the directory referenced, line: script type=text/javascript src=#{url_for_solr}/admin/jquery-1.4.3.min.js. Do you know where #{url_for_solr} points to? On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Implementing Solr Suggester for Autocomplete (multiple columns)
Great!! I haven't see your message yet, perhaps you could create a PR to that Github repository, son it will be in sync with current versions of Solr. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Jueves, 26 de Septiembre 2013 9:10:49 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) solved. On Thu, Sep 26, 2013 at 1:50 PM, JMill apprentice...@googlemail.com wrote: I managed to get rid of the query error by playing jquery file in the velocity folder and adding line: script type=text/javascript src=#{url_for_solr}/admin/file?file=/velocity/jquery.min.jscontentType=text/javascript/script. That has not solved the issues the console is showing a new error - [13:42:55.181] TypeError: $.browser is undefined @ http://localhost:8983/solr/ac/admin/file?file=/velocity/jquery.autocomplete.jscontentType=text/javascript:90;. Any ideas? On Thu, Sep 26, 2013 at 1:12 PM, JMill apprentice...@googlemail.comwrote: Do you know the directory the #{url_root} in script type=text/javascript src=#{url_root}/js/lib/ jquery-1.7.2.min.js/script points too? and same for #{url_for_solr} script type=text/javascript src=#{url_for_solr}/js/lib/jquery-1.7.2.min.js/script On Wed, Sep 25, 2013 at 7:33 PM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Try quering the core where the data has been imported, something like: http://localhost:8983/solr/suggestions/select?q=uc In the previous URL suggestions is the name I give to the core, so this should change, if you get results, then the problem could be the jquery dependency. I don't remember doing any change, as far as I know that js file is bundled with solr (at leat in 3.x) version perhaps you could change it the correct jquery version on solr 4.4, if you go into the admin panel (in solr 3.6): http://localhost:8983/solr/admin/schema.jsp And inspect the loaded code, the required file (jquery-1.4.2.min.js) gets loaded in solr 4.4 it should load a similar file, but perhaps a more recent version. Perhaps you could change that part to something like: script type=text/javascript src=#{url_root}/js/lib/jquery-1.7.2.min.js/script Which is used at least on a solr 4.1 that I have laying aroud here somewhere. In any case you can test the suggestions using the URL that I suggest on the top of this mail, in that case you should be able to see the possible results, of course in a less fancy way. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 13:59:32 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) Could it be the jquery library that is the problem? I opened up solr-home/ac/conf/velocity/head.vm with an editor and I see a reference to the jquery library but I can't seem to find the directory referenced, line: script type=text/javascript src=#{url_for_solr}/admin/jquery-1.4.3.min.js. Do you know where #{url_for_solr} points to? On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Perhaps this could be an issue, I know that this works perfectly in solr 3.6 (this is the one I was using). Currently I don't have a solr 4.4 to do some tests, but what have been done in that core should work in solr 4.4, perhaps there is a setting that need some tweaking but it's impossible of knowing without checking the logs. In case that any incompatibility is present it should pop out on the logs. Regards, - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 11:10:32 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I simple query through admin (*:*) confirms the data is exists. The version I'm working with is solr 4.4.0. The autocomplete manual refers to 3.x. I wonder of this is the problem? On Wed, Sep 25, 2013 at 4:01 PM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: The response does not show any error, can you confirm that the data is in solr? you should be able to see the numDoc stats in the admin UI. Which version of Solr are you using? I believe that the example was tested on Solr 3.x at least at the time I use it. Regards, - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 10:57:31 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I followed the instructions, I am able to browse to http://localhost:8983/solr/ac/browse?q=cedebugQuery=true; but I am not getting any suggestions (typed in c in Find Textbox). I wonder if loading the example data is the problem? The response I get after executing the script feed-ac.sh (step 3
Re: Sorting dependent on user preferences with FunctionQuery
I think you could use boosting queries: for group A you boost one category and for group B some other category. - Mensaje original - De: Snubbel solrforum.20.x...@spamgourmet.com Para: solr-user@lucene.apache.org Enviados: Jueves, 26 de Septiembre 2013 8:01:36 Asunto: Sorting dependent on user preferences with FunctionQuery Hello, I want to present to different user groups a search result in different orders. Say, i have customer group A, which I know prefers Books, I want to get Books at the top of my query result, DVDs at the bottom. And for group B, preferring DVD, these first. In my index I have a field of type text named category with values Book and DVD. I thought maybe I could solve this with QueryFunctions, maybe like this: select?q=*%3A*sort=query(qf=category v='Book')desc but Solr returns Can't determine a Sort Order (asc or desc) in sort. What is wrong? I tried different ways of formulating the query without success... Or, does anyone have a better idea how to solve this? Best regards, Nikola -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-dependent-on-user-preferences-with-FunctionQuery-tp4092119.html Sent from the Solr - User mailing list archive at Nabble.com. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Implementing Solr Suggester for Autocomplete (multiple columns)
I've used a separated core for storing suggestions, based on what I see in: https://github.com/cominvent/autocomplete. You can check the blog post on www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/. This is really flexible, on the downside it does not use the suggester component su this are like regular queries against a separated core. Greetings! - Mensaje original - De: Erick Erickson erickerick...@gmail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 6:16:51 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I've sometimes seen this handled by clever tokenizing. For Bill Rogers, index (untokenized) something like Bill|Bill Rogers Rogers|Bill Rogers Your suggester then is a simple term lookup (see TermsComponent) which is quite fast. What you _don't_ get is autocorrect. But if you use terms.prefix, you can also control whether it's whole word match or not. To get whole-word in the above, you would set your prefix to Rogers| for instance. Or you may want to leave off the | to see more of an autocomplete-type response. Then, of course, when you display this you need to only display what's after the | (or whatever delimiter you use). One other note, this will be case sensitive, so you probably want to do casing yourself, index things like rogers|Bill Rogers and lowercase what you send in to terms component. Best, Erick On Tue, Sep 24, 2013 at 2:01 PM, JMill apprentice...@googlemail.com wrote: Hi, I'm using Solr's Suggester function to implement an autocomplete feature. I have it setup to check against the username and name fields. Problem is when running a query against the name, the second term, after whitespace (surename) returns 0 results. Works if if query is a partial name starting from the begining e.g. Given the name Bill Rogers, a query for Rogers will return 0 results whereas a query for Bill will return positive (Bill Rogers). As for the username, it's not working at. I am after the following behaviour. Match any partial words in the fields username or name and return the results. If there is match in the field name the return the whole name e.g. given the queries Rogers or Bill return Bill Rogers (not the single word that was a match). schema.xml extract .. field name=username type=text_general indexed=true stored=true / field name=name type=text_general indexed=true stored=true/ field name=autocomplete type=textSpell indexed=true stored=false multiValued=true omitNorms=true omitTermFreqAndPositions=false / ... copyField source=username dest=autocomplete/ copyField source=name dest=autocomplete/ ... fieldType class=solr.TextField name=textSpell positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType solrconfig.xml lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldautocomplete/str !-- the indexed field to derive suggestions from -- float name=threshold0.005/float str name=buildOnCommittrue/str !-- str name=sourceLocationamerican-english/str -- /lst /searchComponent .. requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.collatetrue/str /lst arr name=components strspellcheck/str /arr /requestHandler III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Implementing Solr Suggester for Autocomplete (multiple columns)
The response does not show any error, can you confirm that the data is in solr? you should be able to see the numDoc stats in the admin UI. Which version of Solr are you using? I believe that the example was tested on Solr 3.x at least at the time I use it. Regards, - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 10:57:31 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I followed the instructions, I am able to browse to http://localhost:8983/solr/ac/browse?q=cedebugQuery=true; but I am not getting any suggestions (typed in c in Find Textbox). I wonder if loading the example data is the problem? The response I get after executing the script feed-ac.sh (step 3) is the following. user$ ./feed-ac.sh ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime2239/int/lst /response Are you able to confirm if this the expected response? On Wed, Sep 25, 2013 at 1:46 PM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: I've used a separated core for storing suggestions, based on what I see in: https://github.com/cominvent/autocomplete. You can check the blog post on www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/. This is really flexible, on the downside it does not use the suggester component su this are like regular queries against a separated core. Greetings! - Mensaje original - De: Erick Erickson erickerick...@gmail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 6:16:51 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I've sometimes seen this handled by clever tokenizing. For Bill Rogers, index (untokenized) something like Bill|Bill Rogers Rogers|Bill Rogers Your suggester then is a simple term lookup (see TermsComponent) which is quite fast. What you _don't_ get is autocorrect. But if you use terms.prefix, you can also control whether it's whole word match or not. To get whole-word in the above, you would set your prefix to Rogers| for instance. Or you may want to leave off the | to see more of an autocomplete-type response. Then, of course, when you display this you need to only display what's after the | (or whatever delimiter you use). One other note, this will be case sensitive, so you probably want to do casing yourself, index things like rogers|Bill Rogers and lowercase what you send in to terms component. Best, Erick On Tue, Sep 24, 2013 at 2:01 PM, JMill apprentice...@googlemail.com wrote: Hi, I'm using Solr's Suggester function to implement an autocomplete feature. I have it setup to check against the username and name fields. Problem is when running a query against the name, the second term, after whitespace (surename) returns 0 results. Works if if query is a partial name starting from the begining e.g. Given the name Bill Rogers, a query for Rogers will return 0 results whereas a query for Bill will return positive (Bill Rogers). As for the username, it's not working at. I am after the following behaviour. Match any partial words in the fields username or name and return the results. If there is match in the field name the return the whole name e.g. given the queries Rogers or Bill return Bill Rogers (not the single word that was a match). schema.xml extract .. field name=username type=text_general indexed=true stored=true / field name=name type=text_general indexed=true stored=true/ field name=autocomplete type=textSpell indexed=true stored=false multiValued=true omitNorms=true omitTermFreqAndPositions=false / ... copyField source=username dest=autocomplete/ copyField source=name dest=autocomplete/ ... fieldType class=solr.TextField name=textSpell positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType solrconfig.xml lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldautocomplete/str !-- the indexed field to derive suggestions from -- float name=threshold0.005/float str name=buildOnCommittrue/str !-- str name=sourceLocationamerican-english/str -- /lst /searchComponent .. requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.collatetrue/str /lst arr name=components
Re: Implementing Solr Suggester for Autocomplete (multiple columns)
Perhaps this could be an issue, I know that this works perfectly in solr 3.6 (this is the one I was using). Currently I don't have a solr 4.4 to do some tests, but what have been done in that core should work in solr 4.4, perhaps there is a setting that need some tweaking but it's impossible of knowing without checking the logs. In case that any incompatibility is present it should pop out on the logs. Regards, - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 11:10:32 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I simple query through admin (*:*) confirms the data is exists. The version I'm working with is solr 4.4.0. The autocomplete manual refers to 3.x. I wonder of this is the problem? On Wed, Sep 25, 2013 at 4:01 PM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: The response does not show any error, can you confirm that the data is in solr? you should be able to see the numDoc stats in the admin UI. Which version of Solr are you using? I believe that the example was tested on Solr 3.x at least at the time I use it. Regards, - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 10:57:31 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I followed the instructions, I am able to browse to http://localhost:8983/solr/ac/browse?q=cedebugQuery=true; but I am not getting any suggestions (typed in c in Find Textbox). I wonder if loading the example data is the problem? The response I get after executing the script feed-ac.sh (step 3) is the following. user$ ./feed-ac.sh ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime2239/int/lst /response Are you able to confirm if this the expected response? On Wed, Sep 25, 2013 at 1:46 PM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: I've used a separated core for storing suggestions, based on what I see in: https://github.com/cominvent/autocomplete. You can check the blog post on www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/. This is really flexible, on the downside it does not use the suggester component su this are like regular queries against a separated core. Greetings! - Mensaje original - De: Erick Erickson erickerick...@gmail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 6:16:51 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I've sometimes seen this handled by clever tokenizing. For Bill Rogers, index (untokenized) something like Bill|Bill Rogers Rogers|Bill Rogers Your suggester then is a simple term lookup (see TermsComponent) which is quite fast. What you _don't_ get is autocorrect. But if you use terms.prefix, you can also control whether it's whole word match or not. To get whole-word in the above, you would set your prefix to Rogers| for instance. Or you may want to leave off the | to see more of an autocomplete-type response. Then, of course, when you display this you need to only display what's after the | (or whatever delimiter you use). One other note, this will be case sensitive, so you probably want to do casing yourself, index things like rogers|Bill Rogers and lowercase what you send in to terms component. Best, Erick On Tue, Sep 24, 2013 at 2:01 PM, JMill apprentice...@googlemail.com wrote: Hi, I'm using Solr's Suggester function to implement an autocomplete feature. I have it setup to check against the username and name fields. Problem is when running a query against the name, the second term, after whitespace (surename) returns 0 results. Works if if query is a partial name starting from the begining e.g. Given the name Bill Rogers, a query for Rogers will return 0 results whereas a query for Bill will return positive (Bill Rogers). As for the username, it's not working at. I am after the following behaviour. Match any partial words in the fields username or name and return the results. If there is match in the field name the return the whole name e.g. given the queries Rogers or Bill return Bill Rogers (not the single word that was a match). schema.xml extract .. field name=username type=text_general indexed=true stored=true / field name=name type=text_general indexed=true stored=true/ field name=autocomplete type=textSpell indexed=true stored=false multiValued=true omitNorms=true omitTermFreqAndPositions=false / ... copyField source=username dest=autocomplete/ copyField source=name dest=autocomplete/ ... fieldType class=solr.TextField name=textSpell positionIncrementGap=100 analyzer
Re: Implementing Solr Suggester for Autocomplete (multiple columns)
Try quering the core where the data has been imported, something like: http://localhost:8983/solr/suggestions/select?q=uc In the previous URL suggestions is the name I give to the core, so this should change, if you get results, then the problem could be the jquery dependency. I don't remember doing any change, as far as I know that js file is bundled with solr (at leat in 3.x) version perhaps you could change it the correct jquery version on solr 4.4, if you go into the admin panel (in solr 3.6): http://localhost:8983/solr/admin/schema.jsp And inspect the loaded code, the required file (jquery-1.4.2.min.js) gets loaded in solr 4.4 it should load a similar file, but perhaps a more recent version. Perhaps you could change that part to something like: script type=text/javascript src=#{url_root}/js/lib/jquery-1.7.2.min.js/script Which is used at least on a solr 4.1 that I have laying aroud here somewhere. In any case you can test the suggestions using the URL that I suggest on the top of this mail, in that case you should be able to see the possible results, of course in a less fancy way. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 13:59:32 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) Could it be the jquery library that is the problem? I opened up solr-home/ac/conf/velocity/head.vm with an editor and I see a reference to the jquery library but I can't seem to find the directory referenced, line: script type=text/javascript src=#{url_for_solr}/admin/jquery-1.4.3.min.js. Do you know where #{url_for_solr} points to? On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Perhaps this could be an issue, I know that this works perfectly in solr 3.6 (this is the one I was using). Currently I don't have a solr 4.4 to do some tests, but what have been done in that core should work in solr 4.4, perhaps there is a setting that need some tweaking but it's impossible of knowing without checking the logs. In case that any incompatibility is present it should pop out on the logs. Regards, - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 11:10:32 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I simple query through admin (*:*) confirms the data is exists. The version I'm working with is solr 4.4.0. The autocomplete manual refers to 3.x. I wonder of this is the problem? On Wed, Sep 25, 2013 at 4:01 PM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: The response does not show any error, can you confirm that the data is in solr? you should be able to see the numDoc stats in the admin UI. Which version of Solr are you using? I believe that the example was tested on Solr 3.x at least at the time I use it. Regards, - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 10:57:31 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I followed the instructions, I am able to browse to http://localhost:8983/solr/ac/browse?q=cedebugQuery=true; but I am not getting any suggestions (typed in c in Find Textbox). I wonder if loading the example data is the problem? The response I get after executing the script feed-ac.sh (step 3) is the following. user$ ./feed-ac.sh ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime2239/int/lst /response Are you able to confirm if this the expected response? On Wed, Sep 25, 2013 at 1:46 PM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: I've used a separated core for storing suggestions, based on what I see in: https://github.com/cominvent/autocomplete. You can check the blog post on www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/. This is really flexible, on the downside it does not use the suggester component su this are like regular queries against a separated core. Greetings! - Mensaje original - De: Erick Erickson erickerick...@gmail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 6:16:51 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I've sometimes seen this handled by clever tokenizing. For Bill Rogers, index (untokenized) something like Bill|Bill Rogers Rogers|Bill Rogers Your suggester then is a simple term lookup (see TermsComponent) which is quite fast. What you _don't_ get is autocorrect. But if you use terms.prefix, you can also control whether it's whole word match or not. To get whole-word in the above, you would set your prefix to Rogers| for instance. Or you may want to leave off
Re: Implementing Solr Suggester for Autocomplete (multiple columns)
As far as I can tell it is. You can check that by seeing the Console logs on your browser (chrome, firefox, etc.). There should be an error saying that the $ function it's not found. In any case I'll try to set up a testing environment here, but I can only use solr 4.1, which I have here. I haven't downloaded/tested the 4.4 version yet. Do you try replacing the line that includes the jquery-1.4.3.min.js with the new one? - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 14:44:53 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) That seems to work. I get back an xml containing a bunch of suggestions. Can we agree that it's jquery that's the problem? On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Try quering the core where the data has been imported, something like: http://localhost:8983/solr/suggestions/select?q=uc In the previous URL suggestions is the name I give to the core, so this should change, if you get results, then the problem could be the jquery dependency. I don't remember doing any change, as far as I know that js file is bundled with solr (at leat in 3.x) version perhaps you could change it the correct jquery version on solr 4.4, if you go into the admin panel (in solr 3.6): http://localhost:8983/solr/admin/schema.jsp And inspect the loaded code, the required file (jquery-1.4.2.min.js) gets loaded in solr 4.4 it should load a similar file, but perhaps a more recent version. Perhaps you could change that part to something like: script type=text/javascript src=#{url_root}/js/lib/jquery-1.7.2.min.js/script Which is used at least on a solr 4.1 that I have laying aroud here somewhere. In any case you can test the suggestions using the URL that I suggest on the top of this mail, in that case you should be able to see the possible results, of course in a less fancy way. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 13:59:32 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) Could it be the jquery library that is the problem? I opened up solr-home/ac/conf/velocity/head.vm with an editor and I see a reference to the jquery library but I can't seem to find the directory referenced, line: script type=text/javascript src=#{url_for_solr}/admin/jquery-1.4.3.min.js. Do you know where #{url_for_solr} points to? On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Perhaps this could be an issue, I know that this works perfectly in solr 3.6 (this is the one I was using). Currently I don't have a solr 4.4 to do some tests, but what have been done in that core should work in solr 4.4, perhaps there is a setting that need some tweaking but it's impossible of knowing without checking the logs. In case that any incompatibility is present it should pop out on the logs. Regards, - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 11:10:32 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I simple query through admin (*:*) confirms the data is exists. The version I'm working with is solr 4.4.0. The autocomplete manual refers to 3.x. I wonder of this is the problem? On Wed, Sep 25, 2013 at 4:01 PM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: The response does not show any error, can you confirm that the data is in solr? you should be able to see the numDoc stats in the admin UI. Which version of Solr are you using? I believe that the example was tested on Solr 3.x at least at the time I use it. Regards, - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 10:57:31 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I followed the instructions, I am able to browse to http://localhost:8983/solr/ac/browse?q=cedebugQuery=true; but I am not getting any suggestions (typed in c in Find Textbox). I wonder if loading the example data is the problem? The response I get after executing the script feed-ac.sh (step 3) is the following. user$ ./feed-ac.sh ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime2239/int/lst /response Are you able to confirm if this the expected response? On Wed, Sep 25, 2013 at 1:46 PM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: I've used a separated core for storing suggestions, based on what I see in: https://github.com/cominvent/autocomplete. You can check the blog post on www.cominvent.com/2012/01/25/super-flexible-autocomplete
Re: Implementing Solr Suggester for Autocomplete (multiple columns)
That's and indication that jQuery can't be loaded, and without jQuery the autocomplete plugin won't work. This plugin is used to show the popup list that show up at the bottom of the input. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 15:40:00 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) Not yet but I do see the $ not found in console. On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: As far as I can tell it is. You can check that by seeing the Console logs on your browser (chrome, firefox, etc.). There should be an error saying that the $ function it's not found. In any case I'll try to set up a testing environment here, but I can only use solr 4.1, which I have here. I haven't downloaded/tested the 4.4 version yet. Do you try replacing the line that includes the jquery-1.4.3.min.js with the new one? - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 14:44:53 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) That seems to work. I get back an xml containing a bunch of suggestions. Can we agree that it's jquery that's the problem? On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Try quering the core where the data has been imported, something like: http://localhost:8983/solr/suggestions/select?q=uc In the previous URL suggestions is the name I give to the core, so this should change, if you get results, then the problem could be the jquery dependency. I don't remember doing any change, as far as I know that js file is bundled with solr (at leat in 3.x) version perhaps you could change it the correct jquery version on solr 4.4, if you go into the admin panel (in solr 3.6): http://localhost:8983/solr/admin/schema.jsp And inspect the loaded code, the required file (jquery-1.4.2.min.js) gets loaded in solr 4.4 it should load a similar file, but perhaps a more recent version. Perhaps you could change that part to something like: script type=text/javascript src=#{url_root}/js/lib/jquery-1.7.2.min.js/script Which is used at least on a solr 4.1 that I have laying aroud here somewhere. In any case you can test the suggestions using the URL that I suggest on the top of this mail, in that case you should be able to see the possible results, of course in a less fancy way. - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 13:59:32 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) Could it be the jquery library that is the problem? I opened up solr-home/ac/conf/velocity/head.vm with an editor and I see a reference to the jquery library but I can't seem to find the directory referenced, line: script type=text/javascript src=#{url_for_solr}/admin/jquery-1.4.3.min.js. Do you know where #{url_for_solr} points to? On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Perhaps this could be an issue, I know that this works perfectly in solr 3.6 (this is the one I was using). Currently I don't have a solr 4.4 to do some tests, but what have been done in that core should work in solr 4.4, perhaps there is a setting that need some tweaking but it's impossible of knowing without checking the logs. In case that any incompatibility is present it should pop out on the logs. Regards, - Mensaje original - De: JMill apprentice...@googlemail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 25 de Septiembre 2013 11:10:32 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns) I simple query through admin (*:*) confirms the data is exists. The version I'm working with is solr 4.4.0. The autocomplete manual refers to 3.x. I wonder of this is the problem? On Wed, Sep 25, 2013 at 4:01 PM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: The response does not show any error, can you confirm that the data is in solr? you should be able to see the numDoc stats in the admin UI. Which version of Solr are you using? I believe that the example was tested on Solr 3.x at least at the time I use it. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Suggest and Filtering
If is query suggestion what you are looking for, what we've done is storing the user queries into a separated core and pull the suggestions from there. - Mensaje original - De: Brendan Grainger brendan.grain...@gmail.com Para: solr-user@lucene.apache.org Enviados: Jueves, 13 de Junio 2013 19:43:03 Asunto: Suggest and Filtering Hi Solr Guru's I am trying to implement auto suggest where solr would suggest several phrases that would return results as the user types in a query (as distinct from autocomplete). e.g. say the user starts typing 'br' and we have documents that contain brake pads and left disc brake, solr would suggest both of those phrases with brake pads first. I also want to only look at documents that match a given filter query. So say I have a bunch of documents for a toyota cressida that contain the bi-gram brake pads, while the documents for a honda accord don't have any brake pad articles. If the user is filtering on the honda accord I wouldn't want brake pads as a suggestion. Right now, I've played with the suggest component and using faceting. Any thoughts? Thanks Brendan -- Brendan Grainger www.kuripai.com http://www.uci.cu http://www.uci.cu
Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException
One more thing: The hack that you commented when the query is a combination of restricted query operators such +-, +, --++--+%, etc? In this cases the application has to deal with all this cases to. Greetings! - Mensaje original - De: Jérôme Étévé jerome.et...@gmail.com Para: solr-user@lucene.apache.org Enviados: Martes, 23 de Abril 2013 10:44:39 Asunto: Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException If you want to allow your users to search for '+' , you also define your '+' as being a regular ALPHA characters: In config: delimiter_types.txt: # # We let +, # and * be part of normal words. # This is to let c++, c#, c* and RD as words. # + = ALPHA # = ALPHA * = ALPHA = ALPHA @ = ALPHA Then in your solr.WordDelimiterFilterFactory, use types=delimiter_types.txt You'll then be able to let your users search for + as part of a word. If you want to allow them to search for just '+' , a little hacking is necessary in your client code. Personally, I just double quote the query if it's only one char length. Can't be harmful and as it will turn your single + into + , it will be considered as a token (rather than being part of the query syntax) by the parser. Providing you're using the edismax parser, it should be just fine for any other queries, like '+ foo' , 'foo +', '++' ... J. On 23 April 2013 15:09, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cuwrote: Hi Kai: Thanks for your reply, for what I've understood this logic must be included in my application, It would be possible to, for instance, use some regular expression at querying time in my schema to avoid a query that contains only this characters? for instance + and + would be a good catch to avoid. Thanks in advance! - Mensaje original - De: Kai Becker m...@kai-becker.com Para: solr-user@lucene.apache.org Enviados: Martes, 23 de Abril 2013 9:48:26 Asunto: Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException Hi, you need to escape that char in search terms. Special chars are + - ! ( ) { } [ ] ^ ~ * ? : \ / at the moment. The %2B is just the url encoding, but it will still be a + for Solr, so just put a \ in front of the chars I mentioned. Cheers, Kai Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez: Hi! Currently I'm working on a basica search engine for, the main problem is that during some tests a problem was detected, in the application if a user search for the + or - term only or the + string it causes an exception in my application, the problem is caused for an org.apache.lucene.queryParser.ParseException in solr. I get the same response if, from the solr admin interface, I search for the + term. For what I've seen the + character gets encoded into %2B which cause the exception. Is there any way of escaping this character so they behave like any other character? or at least get no response for this cases? I'm using solr 3.6.2, deployed in tomcat7. Greetings! http://www.uci.cu http://www.uci.cu http://www.uci.cu -- Jerome Eteve +44(0)7738864546 http://www.eteve.net/ http://www.uci.cu http://www.uci.cu
Querying only for + character causes org.apache.lucene.queryParser.ParseException
Hi! Currently I'm working on a basica search engine for, the main problem is that during some tests a problem was detected, in the application if a user search for the + or - term only or the + string it causes an exception in my application, the problem is caused for an org.apache.lucene.queryParser.ParseException in solr. I get the same response if, from the solr admin interface, I search for the + term. For what I've seen the + character gets encoded into %2B which cause the exception. Is there any way of escaping this character so they behave like any other character? or at least get no response for this cases? I'm using solr 3.6.2, deployed in tomcat7. Greetings! http://www.uci.cu
Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException
Hi Kai: Thanks for your reply, for what I've understood this logic must be included in my application, It would be possible to, for instance, use some regular expression at querying time in my schema to avoid a query that contains only this characters? for instance + and + would be a good catch to avoid. Thanks in advance! - Mensaje original - De: Kai Becker m...@kai-becker.com Para: solr-user@lucene.apache.org Enviados: Martes, 23 de Abril 2013 9:48:26 Asunto: Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException Hi, you need to escape that char in search terms. Special chars are + - ! ( ) { } [ ] ^ ~ * ? : \ / at the moment. The %2B is just the url encoding, but it will still be a + for Solr, so just put a \ in front of the chars I mentioned. Cheers, Kai Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez: Hi! Currently I'm working on a basica search engine for, the main problem is that during some tests a problem was detected, in the application if a user search for the + or - term only or the + string it causes an exception in my application, the problem is caused for an org.apache.lucene.queryParser.ParseException in solr. I get the same response if, from the solr admin interface, I search for the + term. For what I've seen the + character gets encoded into %2B which cause the exception. Is there any way of escaping this character so they behave like any other character? or at least get no response for this cases? I'm using solr 3.6.2, deployed in tomcat7. Greetings! http://www.uci.cu http://www.uci.cu http://www.uci.cu
Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException
Hi Jérôme: Thanks for your suggestion Jérôme, I'll do as you told me for allowing the search of this specific tokens. I've also taked into account the option of add the quote if lenght is 1 in the application level, but I would like to keep this logic inside of Solr (if possible), this is why I was thinking of some kind of replace regular expresion at query time, so if this change in the future it won't require also changing the application level, can you advice me on this? Greetings! - Mensaje original - De: Jérôme Étévé jerome.et...@gmail.com Para: solr-user@lucene.apache.org Enviados: Martes, 23 de Abril 2013 10:44:39 Asunto: Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException If you want to allow your users to search for '+' , you also define your '+' as being a regular ALPHA characters: In config: delimiter_types.txt: # # We let +, # and * be part of normal words. # This is to let c++, c#, c* and RD as words. # + = ALPHA # = ALPHA * = ALPHA = ALPHA @ = ALPHA Then in your solr.WordDelimiterFilterFactory, use types=delimiter_types.txt You'll then be able to let your users search for + as part of a word. If you want to allow them to search for just '+' , a little hacking is necessary in your client code. Personally, I just double quote the query if it's only one char length. Can't be harmful and as it will turn your single + into + , it will be considered as a token (rather than being part of the query syntax) by the parser. Providing you're using the edismax parser, it should be just fine for any other queries, like '+ foo' , 'foo +', '++' ... J. On 23 April 2013 15:09, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cuwrote: Hi Kai: Thanks for your reply, for what I've understood this logic must be included in my application, It would be possible to, for instance, use some regular expression at querying time in my schema to avoid a query that contains only this characters? for instance + and + would be a good catch to avoid. Thanks in advance! - Mensaje original - De: Kai Becker m...@kai-becker.com Para: solr-user@lucene.apache.org Enviados: Martes, 23 de Abril 2013 9:48:26 Asunto: Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException Hi, you need to escape that char in search terms. Special chars are + - ! ( ) { } [ ] ^ ~ * ? : \ / at the moment. The %2B is just the url encoding, but it will still be a + for Solr, so just put a \ in front of the chars I mentioned. Cheers, Kai Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez: Hi! Currently I'm working on a basica search engine for, the main problem is that during some tests a problem was detected, in the application if a user search for the + or - term only or the + string it causes an exception in my application, the problem is caused for an org.apache.lucene.queryParser.ParseException in solr. I get the same response if, from the solr admin interface, I search for the + term. For what I've seen the + character gets encoded into %2B which cause the exception. Is there any way of escaping this character so they behave like any other character? or at least get no response for this cases? I'm using solr 3.6.2, deployed in tomcat7. Greetings! http://www.uci.cu http://www.uci.cu http://www.uci.cu -- Jerome Eteve +44(0)7738864546 http://www.eteve.net/ http://www.uci.cu http://www.uci.cu
Getting better snippets in highlighting component
Hi all: I'm building a document search plattform, basically indexing a lot of PDF files. Some of this files has an index, which means that when I query for normativos in my application (built using Symfony2+PHP+Solarium) I get a few results like this: 10 6.2 Elementos normativos generales 12 6.3 Elementos normativos técnicos ..32 ANEXOS A Formas verbales (normativo Which is a bit of a problem, is there any way I can get rid of this dots? Is there any sort of relevance in the snippets that the highlighting components returns? I mean in this particular case, the snippet came from the index page of the PDF which I hardly think is the best snippet in the document for this particular query, any thought on this? Is there any golden rule to treat cases like this? Thanks a lot! http://www.uci.cu
Re: Getting better snippets in highlighting component
Hi Jack: Thanks for the reply, exactly I know is a common thing to encounter this TOC in a lot of files, I'm plying with the regex fragmenter to be a little more selective about the generated snippets, but no luck so far. - Mensaje original - De: Jack Krupansky j...@basetechnology.com Para: solr-user@lucene.apache.org Enviados: Sábado, 30 de Marzo 2013 0:40:03 Asunto: Re: Getting better snippets in highlighting component It looks like a table of contents. The dots are followed by the page number, followed by the text from the next table of contents entry, and repeat. Even Google doesn't do anything special for this. For example, search for chapter 1 chapter 2 pdf: [PDF] 2013 Publication 505 - Internal Revenue Service www.irs.gov/pub/irs-pdf/p505.pdfFile Format: PDF/Adobe Acrobat Mar 21, 2013 – Introduction . . . . . . . . . . . . . . . . . . 1. What's New for 2013 . . . . . . . . . . . . . 2. Reminders . . . . . . . . . . . . . . . . . . . 2. Chapter 1. Tax Withholding for ... I'm sure somebody can come up with a clever heuristic to avoid this kind of thing. Maybe simply truncate any sequence of white space and only punctuation down to two or three characters or so. -- Jack Krupansky -Original Message- From: Jorge Luis Betancourt Gonzalez Sent: Friday, March 29, 2013 10:34 PM To: solr-user@lucene.apache.org Subject: Getting better snippets in highlighting component Hi all: I'm building a document search plattform, basically indexing a lot of PDF files. Some of this files has an index, which means that when I query for normativos in my application (built using Symfony2+PHP+Solarium) I get a few results like this: 10 6.2 Elementos normativos generales 12 6.3 Elementos normativos técnicos ..32 ANEXOS A Formas verbales (normativo Which is a bit of a problem, is there any way I can get rid of this dots? Is there any sort of relevance in the snippets that the highlighting components returns? I mean in this particular case, the snippet came from the index page of the PDF which I hardly think is the best snippet in the document for this particular query, any thought on this? Is there any golden rule to treat cases like this? Thanks a lot! http://www.uci.cu http://www.uci.cu http://www.uci.cu
Question about email search
I'm using solr 3.6.2 to crawl some data using nutch, in my schema I've one field with all the content extracted from the page, which could possibly include email addresses, this is the configuration of my schema: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SnowballPorterFilterFactory languange=Spanish/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType The thing is that I'm trying to search against a field of this type (text) with a value like @gmail.com and I'm intended to get documents with that text, any advice? slds -- It is only in the mysterious equation of love that any logical reasons can be found. Good programmers often confuse halloween (31 OCT) with christmas (25 DEC)
Re: Question about email search
Sorry for the duplicated mail :-(, any advice on a configuration for searching emails in a field that does not have only email addresses, so the email addresses are contained in larger textual messages? - Mensaje original - De: Ahmet Arslan iori...@yahoo.com Para: solr-user@lucene.apache.org Enviados: Jueves, 14 de Marzo 2013 11:23:47 Asunto: Re: Question about email search Hi, Since you have word delimiter filter in your analysis chain, I am not sure if e-mail addresses are recognised. You can check that on solr admin UI, analysis page. If e-mail addresses kept one token, I would use leading wildcard query. q=*@gmail.com There was a similar question recently: http://search-lucene.com/m/XF2ejnM6Vi2 --- On Thu, 3/14/13, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: From: Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu Subject: Question about email search To: solr-user@lucene.apache.org Date: Thursday, March 14, 2013, 5:11 PM I'm using solr 3.6.2 to crawl some data using nutch, in my schema I've one field with all the content extracted from the page, which could possibly include email addresses, this is the configuration of my schema: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SnowballPorterFilterFactory languange=Spanish/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType The thing is that I'm trying to search against a field of this type (text) with a value like @gmail.com and I'm intended to get documents with that text, any advice? slds -- It is only in the mysterious equation of love that any logical reasons can be found. Good programmers often confuse halloween (31 OCT) with christmas (25 DEC)
Re: Using suggester for smarter phrase autocomplete
Currently I'm using a separated core to query suggestions, for this I've started from: https://github.com/cominvent/autocomplete. Basically the suggester component I'm only using it for term suggestions based on the a tokenized field in my schema (all of this in solr 3.6), perhaps instead of using the suggester component could you use a more similar approach (more like the one on the github repo). - Mensaje original - De: Eric Wilson wilson.eri...@gmail.com Para: solr-user@lucene.apache.org Enviados: Miércoles, 13 de Marzo 2013 13:11:05 Asunto: Re: Using suggester for smarter phrase autocomplete I'm not concerned about stopwords, rather the situation where the first and second words are rarely used together, so don't occur together in a phrase in the dictionary. Thanks. On Wed, Mar 13, 2013 at 11:11 AM, Robert Muir rcm...@gmail.com wrote: On Wed, Mar 13, 2013 at 11:07 AM, Eric Wilson wilson.eri...@gmail.com wrote: I'm trying to use the suggester for auto-completion with Solr 4. I have followed the example configuration for phrase suggestions at the bottom of this wiki page: http://wiki.apache.org/solr/Suggester https://mail.manta.com/owa/redir.aspx?C=a570b5bb74f64f4fb810ba260e304ec5URL=http%3a%2f%2fwiki.apache.org%2fsolr%2fSuggester This shows how to use a text file with the following text for phrase suggestions: # simple auto-suggest phrase dictionary for testing # note this uses tabs as separator! the first phrase1.0 the second phrase 2.0 testing 12343.0 foo 5.0 the fifth phrase2.0 the final phrase4.0 This seems to be working in the expected way. If I query for the f I receive the following suggestions: strthe final phrase/str strthe fifth phrase/str strthe first phrase/str I would like to deal with the case where the user is interested in the foo. When the fo is entered, there will be no suggestions. Is it possible to provide both the phrase matches, and the matches for individual words, so that when the user entered text is no longer part of any actual phrase, there are still suggestions to be made for the final word? Is it really the case that you want matches for individual words, or just to handle e.g. the stopwords case like 'the fo' - foo ? the latter can be done with analyzingsuggester (configure a stopfilter on the analyzer).
Re: Building a central index with Lucene + Solr
Agreed, PHP and Solr are an excellent combination. I'm using Solr 3.6 + PHP (Symfony2 + NelmioSolariumBundle + Solarium) and getting excellent results. Even solarium as a PHP library is great, right now it lack's of solr4 support, but for solr 3.6 it's great. - Mensaje original - De: David Quarterman da...@corexe.com Para: solr-user@lucene.apache.org Enviados: Martes, 5 de Marzo 2013 10:56:18 Asunto: RE: Building a central index with Lucene + Solr Hi Alvaro, I agree with Otis Alexandre (esp. Windows + PHP!). However, there are plenty of people using Solr PHP out there very successfully. There's another good package at http://code.google.com/p/solr-php-client/ which is easy to implement and has some example usage. Regards, DQ From: Álvaro Vargas Quezada [mailto:al...@outlook.com] Sent: 05 March 2013 14:53 To: solr-user@lucene.apache.org Subject: Building a central index with Lucene + Solr Hi everyone! I'm trying to develop a central index, I installed Solr and I reach the screen that I attach. But the problem is that I don't know how to continue since this point, I wanted to develop an app in php which use Solr, but I don't know how, anyone that can help me maybe with a tutorial or something like that? Thanks and greetz from Chile!
Custom update handler
Hi: I'm trying to build a custom update handler to accomplish one specific task. In our app we do query suggestions based on previous queries passed into our frontend app, the thing is that instead of getting this queries from the solr logs, we stored in a separated core. So far so good, but one particular requirement is that not every query typed by the users in the search box appears as a suggestion, only the more popuparls. For this we created a field in the schema called count. And write code in out frontend to increase this value, to be honest we don't like this. So we came up with an idea of writing a custom update handler that before store the query in the index, checks if the query exists and then add 1 to the counter. The thing is that right now we have set up a dedupe component to avoid storing very similar queries, is there any way of accessing the dedupe component from the custom update handler? Is there any documentation I can check out to see anything similar to this? Greetings
Indexing several parts of PDF file
Hi: I'm working on a search engine for several PDF documents, right now one of the requirements is that we can provide not only the documents matching the search criteria but the page that match the criteria. Normally tika only extracts the text content and does not do this distinction, but using some custom library this could be achieve, but my question is how to structure the schema. For what I've seen one approach could be the use dynamic fields: dynamicField name=page_* type=text indexed=true stored=true/ So at query time I could extract the page number from the fields name. Is this the best approach? Is there any form of storing the number page into an attribute and not using the dynamic fields? Thanks in advance! Greetings -- It is only in the mysterious equation of love that any logical reasons can be found. Good programmers often confuse halloween (31 OCT) with christmas (25 DEC)
Re: Indexing several parts of PDF file
Thanks for the advice the thing with this approach is that we are using nutch as our crawler for the intranet, and right now, doing this (indexing one crawled document as several solr documents) it's not possible without changing the way nutch works. Is there any other workaround this? Thanks for the replies! - Mensaje original - De: Upayavira u...@odoko.co.uk Para: solr-user@lucene.apache.org Enviados: Martes, 5 de Febrero 2013 9:05:58 Asunto: Re: Indexing several parts of PDF file This would involve you querying against every page in your document, which will be too many fields and will break quickly. The best way to do it is to index pages as documents. You can use field collapsing to group pages from the same document together. Upayavira On Tue, Feb 5, 2013, at 02:00 PM, Jorge Luis Betancourt Gonzalez wrote: Hi: I'm working on a search engine for several PDF documents, right now one of the requirements is that we can provide not only the documents matching the search criteria but the page that match the criteria. Normally tika only extracts the text content and does not do this distinction, but using some custom library this could be achieve, but my question is how to structure the schema. For what I've seen one approach could be the use dynamic fields: dynamicField name=page_* type=text indexed=true stored=true/ So at query time I could extract the page number from the fields name. Is this the best approach? Is there any form of storing the number page into an attribute and not using the dynamic fields? Thanks in advance! Greetings -- It is only in the mysterious equation of love that any logical reasons can be found. Good programmers often confuse halloween (31 OCT) with christmas (25 DEC)
Migrating from Solr 3.6.1 to Solr 4
Hi: I'm currently working with solr 3.6.1, but solr 4 has great features like the ones bundled with SolrCloud, the content in the index is really not the problem to the transition, the thing is that I've a large app written in PHP + Solarium that interacts with the index in solr 3. As far as I know there is no support for solr 4 in solarium. So my question is is possible to use a solr 3.6.1 fronted that gets the data from a solr 4 behind scenes, or there is any other workaround this? Greetings! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Migrating from Solr 3.6.1 to Solr 4
So, from my php app point of view if I have the desire of using solrcloud feautures changes will be needed right? One more thing the responses generated from solr4 are in any way different from the ones generated from solr3? Because solarium parses the JSON response from the server to provide high level objects encapsulating the response and response content. Greetings! - Mensaje original - De: Upayavira u...@odoko.co.uk Para: solr-user@lucene.apache.org Enviados: Sábado, 5 de Enero 2013 4:49:01 Asunto: Re: Migrating from Solr 3.6.1 to Solr 4 Try pointing your app at 4.0. I converted an app recently. Here's the steps I took (as I recall): * get original solrconfig.xml for the release I'm using * diff that and my solrconfig.xml * apply those changes to a 4.0 solrconfig.xml * try to start up solr with this new solrconfig and an old schema and an old index * fix each problem you find in the schema - some class names have changed - you may want to delete some field definitions that you're not using - you'll need to copy the version field from the 4.0 schema I found my app was able to search/index without any difficulty via the XML/HTTP interface. Your mileage may vary, but for that particular app, that is what it took. Note, 4.0 can work in a 3.x way (old style replication, etc). You don't need to use SolrCloud etc when using 4.0. Upayavira On Sat, Jan 5, 2013, at 08:20 AM, Jorge Luis Betancourt Gonzalez wrote: Hi: I'm currently working with solr 3.6.1, but solr 4 has great features like the ones bundled with SolrCloud, the content in the index is really not the problem to the transition, the thing is that I've a large app written in PHP + Solarium that interacts with the index in solr 3. As far as I know there is no support for solr 4 in solarium. So my question is is possible to use a solr 3.6.1 fronted that gets the data from a solr 4 behind scenes, or there is any other workaround this? Greetings! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Dedup component
Is this updatable fields available in Solr 3.6.1, is the one I'm using right now. - Mensaje original - De: Upayavira u...@odoko.co.uk Para: solr-user@lucene.apache.org Enviados: Sábado, 15 de Diciembre 2012 7:56:45 Asunto: Re: Dedup component Make the ID field out of the query text so you don't have to use the dedup component, then use the updatable fields functionality in Solr 4.0: $ curl http://localhost:8983/solr/update -H 'Content-type:application/json' -d ' [ {id: book1, copies_i : { inc : 1}, cat : { add : fantasy}, ISBN_s: { set : 0-380-97365-0} remove_s : { set : null } } ]' /* example stolen from Yonik's ApacheCon talk */ Upayavira On Sat, Dec 15, 2012, at 01:34 AM, Jorge Luis Betancourt Gonzalez wrote: Hi all: I'm trying to build a query suggestion system using solr (also used to index all the data in the app). I've a separated core dedicated only for this purpose (along with some other for images, etc.). In the main app, written in Symfoy2 + Solarium Bundle, we store the queries in this core, to prevent the indexing of duplicated queries, I use the dedup component: !-- Delete similar duplicated documents on index time, using some fuzzy text similary techniques -- updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool bool name=overwriteDupesfalse/bool str name=signatureFieldsignature/str str name=fieldstextsuggest,textng/str str name=signatureClass org.apache.solr.update.processor.TextProfileSignature /str /processor processor class=solr.LogUpdateProcessorFactory/ processor class=solr.RunUpdateProcessorFactory/ /updateRequestProcessorChain Which prevent the store of very similar queries, but with this configuration, but what I really trying to accomplish is to increment a count (popularity) field when the same query is sent to solr. Any thought on this? Greetings! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Dedup component
Exist any similar approach that I could use in solr 3.6.1 or should I add this logic to my application? - Mensaje original - De: Upayavira u...@odoko.co.uk Para: solr-user@lucene.apache.org Enviados: Sábado, 15 de Diciembre 2012 12:37:11 Asunto: Re: Dedup component Nope, it is a Solr 4.0 thing. In order for it to work, you need to store every field, as what it does behind the scenes is retrieve the stored fields, rebuilds the document, and then posts the whole document back. Upayavira On Sat, Dec 15, 2012, at 04:52 PM, Jorge Luis Betancourt Gonzalez wrote: Is this updatable fields available in Solr 3.6.1, is the one I'm using right now. - Mensaje original - De: Upayavira u...@odoko.co.uk Para: solr-user@lucene.apache.org Enviados: Sábado, 15 de Diciembre 2012 7:56:45 Asunto: Re: Dedup component Make the ID field out of the query text so you don't have to use the dedup component, then use the updatable fields functionality in Solr 4.0: $ curl http://localhost:8983/solr/update -H 'Content-type:application/json' -d ' [ {id: book1, copies_i : { inc : 1}, cat : { add : fantasy}, ISBN_s: { set : 0-380-97365-0} remove_s : { set : null } } ]' /* example stolen from Yonik's ApacheCon talk */ Upayavira On Sat, Dec 15, 2012, at 01:34 AM, Jorge Luis Betancourt Gonzalez wrote: Hi all: I'm trying to build a query suggestion system using solr (also used to index all the data in the app). I've a separated core dedicated only for this purpose (along with some other for images, etc.). In the main app, written in Symfoy2 + Solarium Bundle, we store the queries in this core, to prevent the indexing of duplicated queries, I use the dedup component: !-- Delete similar duplicated documents on index time, using some fuzzy text similary techniques -- updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool bool name=overwriteDupesfalse/bool str name=signatureFieldsignature/str str name=fieldstextsuggest,textng/str str name=signatureClass org.apache.solr.update.processor.TextProfileSignature /str /processor processor class=solr.LogUpdateProcessorFactory/ processor class=solr.RunUpdateProcessorFactory/ /updateRequestProcessorChain Which prevent the store of very similar queries, but with this configuration, but what I really trying to accomplish is to increment a count (popularity) field when the same query is sent to solr. Any thought on this? Greetings! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Solr PHP client
Hi Guillaume: I beg to differ, it's true that the native solr support has been a big aid to developers use of solr from many programming languages. But making all the queries by hand is not wice and in any case is hard to maintain, it's easier using some OO library to interact with solr. For instance, I'm using right now Solarium to interact with Solr 3.6.1 within a Symfony2 app, in this particular scenario the Solarium handles all the interaction with the solr server. I work in my code with classes and beneath solarium talks json with the solr server. My point is that the ability of solr to talk a lot of standard formats it's a huge plus, but having a library that handles the heavy stuffs with the server keeps your code clean. Greetings, - Mensaje original - De: Guillaume Rossolini guillaume.rossol...@instantluxe.com Para: solr-user@lucene.apache.org Enviados: Viernes, 14 de Diciembre 2012 3:22:41 Asunto: Re: Solr PHP client Hi, The various Solr PHP clients have been a great help in the past, and I do not mean to belittle their efforts. However, the Solr project has made many efforts to support several input and output data formats, including JSON and even serialized PHP, which are fairly easy to implement. Maybe I am mistaken, but I am not sure any PHP client (as an extension or as a library) would actually help much any more. Regards, -- I N S T A N T | L U X E - 40 Rue D'Aboukir - 75002 Paris - France On Fri, Dec 14, 2012 at 8:23 AM, Romita Saha romita.s...@sg.panasonic.comwrote: Hi, Can anyone please guide me to use SolrPhpClient? The documents available are not clear. As to where to place SolrPhpClient? I have downloaded SolrPhpClient and have changed the following lines, specifying the path (where the files are present in my computer) require_once('/home/solr/SolrPhpClient/Apache/Solr/Document.php./Document.php'); require_once('/home/solr/SolrPhpClient/Apache/Solr/Document.php./Response.php'); After this I am unable to proceed. What and how should I index my documents now. How should I start my solr. Where to place the conf files. I see there are few html documents inside the folder SolrPhpClien/phpdocs. Could someone please help. Thanks and regards, Romita 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Prevent indexing documents with some terms
Hi: Is there any way that I can prevent a document from being indexed? I've a separated core only for query suggestions, this queries are stored right from the frontend app, so I'm trying to prevent some kind of bad intended queries to be stored in my query, but keeping the logic of what I consider bad intended out of the fronted application. The stop words only prevent to store some words in the index, but there is any way of prevented the storing of the whole document? Greetings! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: PHP client
Any news on Solarium Project? Is the one I'm using with Solr 3.6! - Mensaje original - De: Bill Au bill.w...@gmail.com Para: solr-user@lucene.apache.org, Arkadi Colson ark...@smartbit.be Enviados: Viernes, 7 de Diciembre 2012 13:40:20 Asunto: Re: PHP client I have not used the pecl Solr client. I have been using SolrPhpClient. I came across this patch for pecl when I was researching php client for Solr 4.0. SolrPhpClient has the same problem with 4.0 that this patch addresses. Bill On Fri, Dec 7, 2012 at 11:00 AM, Arkadi Colson ark...@smartbit.be wrote: Thanks for the info! Do you know if it'spossible to use file uploads to Tika with this client? On 12/03/2012 03:56 PM, Bill Au wrote: https://bugs.php.net/bug.php?**id=62332https://bugs.php.net/bug.php?id=62332 There is a fork with patches applied. On Mon, Dec 3, 2012 at 9:38 AM, Arkadi Colson ark...@smartbit.be wrote: Hi Anyone tested the pecl Solr Client in combination with SolrCloud? I seems to be broken since 4.0 Best regard Arkadi -- Met vriendelijke groeten Arkadi Colson Smartbit bvba . Hoogstraat 13 . 3670 Meeuwen T +32 11 64 08 80 . F +32 11 64 08 81 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: News clustering
I'm trying to using to search though news websites, but I was interested in classification on index time, is there any available solution for this? Greetings! On Dec 3, 2012, at 12:37 PM, Stanislaw Osinski stanis...@osinski.name wrote: I mean measuring the similarity between the document in each cluster. Also, difference between document on one cluster with another cluster. I saw the sample code ClusteringQualityBencmark.java However, I do not know how to make use of it for assessing my Solr Clustering performance. You'd need to write your own code for this, here are the most common clustering quality measures you mentioned: http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_clustering_results These are meant for the general case (numeric attributes), to apply them to texts, you'd need to use the vector representation of the documents. One a more general note, synthetic measures test only the document-cluster assignments, but none take the quality of labels into account (this is really hard to measure objectively). Staszek 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Suggester with punctuation signs
Hi! Upayavira: Hi I'm using the standard tokenizer right now, and it's working fine, but I was wondering if there is any form I can strip this punctuation marks right in the suggest requestHandler, so no need for index again. I've been doing some tests and increasing the threshold has improved the accuracy of the suggestions, one more thing is that the suggestions are mainly in spanish, so, any best practice configuration for this? or any standard configuration will do the trick? Thanks! On Nov 26, 2012, at 6:18 PM, Upayavira u...@odoko.co.uk wrote: You may want to change your tokenisation anyhow, as a search for 'universidad' will not match your term 'universidad,' But you are on the right track - to improve suggestions, improve what is in your index. Upayavira On Mon, Nov 26, 2012, at 07:54 PM, Jorge Luis Betancourt Gonzalez wrote: Hi: I've configured my solr setup to use the suggester component and to get terms suggestions from a PHP application, the thing is that I'm getting results like universidad, note the punctuation sign, is there any way I can get rid of this? Or do I need to create a separate field and strip all punctuation signs?. Greetings 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Suggester with punctuation signs
Hi: I've configured my solr setup to use the suggester component and to get terms suggestions from a PHP application, the thing is that I'm getting results like universidad, note the punctuation sign, is there any way I can get rid of this? Or do I need to create a separate field and strip all punctuation signs?. Greetings 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: php client for Solr 4.0.0
I'm currently using solarium with solr 3.6, perhaps you can tweak solarium as needed? I suppose that pull requests are welcome into solarium for solr 4. Greetings! On Nov 12, 2012, at 2:56 PM, Bill Au bill.w...@gmail.com wrote: Anyone know of a PHP client that is compatible with Solr 4.0.0? I am using an old PHP client that is trying to set the waitFlush parameter on a commit so it is failing. Bill 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: is it possible to save the search query?
I think that solr by him self doesn't store the queries (correct me if I'm wrong, about this) but you can accomplish what you want by processing the solr log (its the only way I think). From the solr log you can get the queries and then process the queries according to your needs, and change the boost parameters in your app o solr config. On Nov 8, 2012, at 11:32 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Aha, I think I understand. Yes, you could collect all doc IDs from each query and find the differences. There is nothing in Solr that can find those differences or that would store doc IDs of returned hits in the first place, so you would have to implement this yourself. Sematext's Search Analytics service my be of help here in the sense that all data you need (queries, doc IDs, etc.) are collected, so it would be a matter of providing an API to get the data for off-line analysis. But this data collection+diffing is also something you could implement yourself. One thing to think about - what do you do when a query returns a lrge number of hits. Do you really want/need to get IDs for all of them, or only a page at a time. Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Nov 8, 2012 at 1:01 AM, Romita Saha romita.s...@sg.panasonic.comwrote: Hi, The following is the example; 1st query: http://localhost:8983/solr/db/select/?defType=dismaxdebugQuery=onq=cashier2qf=data ^2 idstart=0rows=11fl=data,id Next query: http://localhost:8983/solr/db/select/?defType=dismaxdebugQuery=onq=cashier2qf=data id^2start=0rows=11fl=data,id In the 1st query the the field 'data' is boosted by 2. However may be the user was not satisfied with the response. Thus in the next query he boosted the field 'id' by 2. I want to record both the queries and compare between the two, meaning, what are the changes implemented on the 2nd query which are not present in the previous one. Thanks and regards, Romita Saha From: Otis Gospodnetic otis.gospodne...@gmail.com To: solr-user@lucene.apache.org, Date: 11/08/2012 01:35 PM Subject:Re: is it possible to save the search query? Hi, Compare in what sense? An example will help. Otis -- Performance Monitoring - http://sematext.com/spm On Nov 7, 2012 8:45 PM, Romita Saha romita.s...@sg.panasonic.com wrote: Hi All, Is it possible to record a search query in solr and then compare it with the previous search query? Thanks and regards, Romita Saha 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Storing queries in Solr
Thanks for the quick response, I'm trying to get a suggester query, I found odd the being a very common issue solr doesn't provide any built in mechanism for query suggestions, but implementing the other components isn't so hard either. Greetiings! On Oct 8, 2012, at 3:38 AM, Upayavira wrote: Solr has a small query cache, but this does not hold queries for any length of time, so won't suit your purpose. The LucidWorks Search product has (I believe) a click tracking feature, but that is about boosting documents that are clicked on, not specific search terms. Parsing the Solr log, or pushing query terms to a different core/index would really be the only way to achieve what you're suggesting, as far as I am aware. Processing logs would be preferable anyhow, as you don't really want to be triggering an index write during each query (assuming you have more queries than updates to your main index), and also if this is for building a suggester index, then it is unlikely to need updating that regularly - every hour or every day should be more than sufficient. You could write a SearchComponent that logs queries in another format, should the existing log format not be sufficient for you. Upayavira On Mon, Oct 8, 2012, at 01:24 AM, Jorge Luis Betancourt Gonzalez wrote: Hi! I was wondering if there are any built-in mechanism that allow me to store the queries made to a solr server inside the index itself. I know that the suggester module exist, but as far as I know it only works for terms existing in the index, and not with queries. I remember reading about using some external program to parse the solr log and pushing the queries or any other interesting data into the index, is this the only way of accomplish this? Greetings! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Storing queries in Solr
Hi! I was wondering if there are any built-in mechanism that allow me to store the queries made to a solr server inside the index itself. I know that the suggester module exist, but as far as I know it only works for terms existing in the index, and not with queries. I remember reading about using some external program to parse the solr log and pushing the queries or any other interesting data into the index, is this the only way of accomplish this? Greetings! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Question about OR operator
Thanks a lot for all the replies, Chris it worked out with this mm value: str name=mm 10% /str If this version of solr is affected with the bug you pointed out, shouldn't fail with this value as well? Greetings! On Oct 4, 2012, at 8:48 PM, Jorge Luis Betancourt Gonzalez wrote: Hi Chris: I'm using solr 3.6.1, is the bug present in this version? Greetings! On Oct 4, 2012, at 6:11 PM, Chris Hostetter wrote: : GRAVE: java.lang.NumberFormatException: For input string: :100 : :at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) :at java.lang.Integer.parseInt(Integer.java:470) :at java.lang.Integer.init(Integer.java:636) :at org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch(SolrPluginUtils.java:691) What version of Solr are you using? That looks like a simple parsing bug that seems to have been fixed a while back (it's definitely not in the 4.0 branch) can you try eliminating hte whitespace from your XML configured value... str name=mm100/str ...that should work arround the problem. -Hoss 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Question about OR operator
Hi: I'm having an issue with solr 3.6.1 and I'm sensing that is a lack of understanding. I'm building a search engine, using of course solr to store the inverted index, so far so good. When I search for a term, let's say java I get 761 results, then querying the index with a php term give me 3194 results found. So if a do a query for java php (without any quotas) I suppose that solr will interpret this as an OR between the two terms, correct? so the results should be the JOIN between the two subsets of results? so can anyone explain why I get less results searching for the last query? java php without any quotes?? Thanks in advance!! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Question about OR operator
Hi: Thanks for all the replies, right now I have this in my mm parameter: str name=mm 2-1 5-2 690% /str I'm trying to get an straight OR between all the terms in my query, should I set the mm parameter to 1? because this gave an error. Greetings! On Oct 4, 2012, at 11:06 AM, Jorge Luis Betancourt Gonzalez wrote: Hi: I'm having an issue with solr 3.6.1 and I'm sensing that is a lack of understanding. I'm building a search engine, using of course solr to store the inverted index, so far so good. When I search for a term, let's say java I get 761 results, then querying the index with a php term give me 3194 results found. So if a do a query for java php (without any quotas) I suppose that solr will interpret this as an OR between the two terms, correct? so the results should be the JOIN between the two subsets of results? so can anyone explain why I get less results searching for the last query? java php without any quotes?? Thanks in advance!! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Question about OR operator
This is the error: GRAVE: java.lang.NumberFormatException: For input string: 100 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:470) at java.lang.Integer.init(Integer.java:636) at org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch(SolrPluginUtils.java:691) at org.apache.solr.util.SolrPluginUtils.setMinShouldMatch(SolrPluginUtils.java:656) at org.apache.solr.search.DisMaxQParser.getUserQuery(DisMaxQParser.java:210) at org.apache.solr.search.DisMaxQParser.addMainQuery(DisMaxQParser.java:166) at org.apache.solr.search.DisMaxQParser.parse(DisMaxQParser.java:77) at org.apache.solr.search.QParser.getQuery(QParser.java:143) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:105) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:165) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) This is the parameter in my solrconfig.xml str name=mm 0 /str On Oct 4, 2012, at 1:46 PM, Otis Gospodnetic wrote: What's the error Jorge? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 4, 2012 at 1:36 PM, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Hi: Thanks for all the replies, right now I have this in my mm parameter: str name=mm 2-1 5-2 690% /str I'm trying to get an straight OR between all the terms in my query, should I set the mm parameter to 1? because this gave an error. Greetings! On Oct 4, 2012, at 11:06 AM, Jorge Luis Betancourt Gonzalez wrote: Hi: I'm having an issue with solr 3.6.1 and I'm sensing that is a lack of understanding. I'm building a search engine, using of course solr to store the inverted index, so far so good. When I search for a term, let's say java I get 761 results, then querying the index with a php term give me 3194 results found. So if a do a query for java php (without any quotas) I suppose that solr will interpret this as an OR between the two terms, correct? so the results should be the JOIN between the two subsets of results? so can anyone explain why I get less results searching for the last query? java php without any quotes?? Thanks in advance!! 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos
Re: Question about OR operator
Thanks for the quick response, I got the same response, what I'm trying to accomplish is to get straight OR between all the clauses or terms in my query, the value I should use is 0 right? 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Question about OR operator
Hi Chris: I'm using solr 3.6.1, is the bug present in this version? Greetings! On Oct 4, 2012, at 6:11 PM, Chris Hostetter wrote: : GRAVE: java.lang.NumberFormatException: For input string: : 100 : : at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) : at java.lang.Integer.parseInt(Integer.java:470) : at java.lang.Integer.init(Integer.java:636) : at org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch(SolrPluginUtils.java:691) What version of Solr are you using? That looks like a simple parsing bug that seems to have been fixed a while back (it's definitely not in the 4.0 branch) can you try eliminating hte whitespace from your XML configured value... str name=mm100/str ...that should work arround the problem. -Hoss 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci