AW: Structured fields and termVectors
Hello, I think you can use a field that is stored and indexed. On the index time you can use a Keywordtokanizer und a filter to reduce the Path (without M). The value to display (stored Field) is always the orginal value, that means the value that comes in. The value was stored before the tokanizer and filter are working. The indexed values (and Termvectors) are the terms after the tokanizer and filters, so you have the reduce Path in the Termvectors. I hope this can Help you. Mit freundlichen Grüßen M.Sc. Dipl.-Inf. (FH) Martin Rödig SHI Elektronische Medien GmbH - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - AKTUELL - NEU - AB SOFORT Solr/Lucene Schulung vom 28. - 30. Juni in Zürich Wien 19. - 21.07.2011 | München 27. - 29.09. und 15. - 17.11.2011 Als erster zertifizierter Trainingspartner von Lucid Imagination in Deutschland, Österreich und Schweiz bietet SHI ab sofort deutschsprachige Solr Schulungen an. Weitere Informationen: www.shi-gmbh.com/services/solr-training Achtung: Die Anzahl der Plätze ist beschränkt! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Postadresse: Watzmannstr. 23, 86316 Friedberg Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg Internet: http://www.shi-gmbh.com Registergericht Augsburg HRB 17382 Geschäftsführer: Peter Spiske Steuernummer: 103/137/30412 -Ursprüngliche Nachricht- Von: Jack Repenning [mailto:jrepenn...@collab.net] Gesendet: Dienstag, 17. Mai 2011 03:30 An: solr-user@lucene.apache.org Betreff: Structured fields and termVectors How does MoreLikeThis use termVectors? My documents (full sample at the bottom) frequently include lines more or less like this M /trunk/home/.Aquamacs/Preferences.el I want to MoreLikeThis based on the full path, but not the M. But what I actually display as a search result should include M (should look pretty much like the sample, below). If I define a field to include that whole line, I can certainly search in ways that skip the M, but how do I control the termVector and MoreLikeThis? I think the answer is not to termVector the line as shown, but rather to index these lines twice, once whole (which is also copyFielded into the display text), and a second time with just the path (and termVectors=true). Which is OK, but since these lines will represent most of my data, double-indexing seems to double my storage, which is ... oh, well ... not entirely optimal. So is there some way I can index the full line, once, with M and path, and tell the termVector to include the whole path and nothing but the path? -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep r3580 | jack | 2011-04-26 13:55:46 -0700 (Tue, 26 Apr 2011) | 1 line Changed paths: M /trunk/home/.Aquamacs M /trunk/home/.Aquamacs/Preferences.el M /trunk/www/wynton-start-page.html simplify the hijack of Aquamacs prefs storage, aufl
AW: search by url in Solr?
Hello, if you want to have more than one searchfield, you can use the Dismax or eDismax Queryparser. There you can set more than one field to search. Example: requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str str name=fl*,score/str str name=qftext^1.0 title^0.05 author^0.2 shi_quelle^0.4 shi_year adrp_keywords^0.5 shi_path^2.0/str ... Qf are the Query fields in wich the parser will search. The numbers with the ^ are boost Factors. The Dismax have many more nice features look at it on http://wiki.apache.org/solr/DisMaxQParserPlugin Mit freundlichen Grüßen M.Sc. Dipl.-Inf. (FH) Martin Rödig SHI Elektronische Medien GmbH Postadresse: Watzmannstr. 23, 86316 Friedberg Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg Tel.: 0821 7482633 18 Tel.: 0821 7482633 0 (Zentrale) Fax: 0821 7482633 29 Internet: http://www.shi-gmbh.com Registergericht Augsburg HRB 17382 Geschäftsführer: Peter Spiske Steuernummer: 103/137/30412 -Ursprüngliche Nachricht- Von: Anurag [mailto:anurag.it.jo...@gmail.com] Gesendet: Mittwoch, 11. Mai 2011 07:05 An: solr-user@lucene.apache.org Betreff: Re: search by url in Solr? thanks! , it worked. Can i mention defaultSearchFieldurl,content/ defaultSearchField to inlcude two default fields.? On Wed, May 11, 2011 at 3:02 AM, Rakudten [via Lucene] ml-node+2924686-576776982-146...@n3.nabble.com wrote: Hello. One option is to specify a default search field in your schema.xml. If your query doesn´t include an specific field the query parser will use the default one to launch the query. You should include in your schema.xml something like this: !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldurl/defaultSearchField -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/search-by-url-in-Solr-tp2924632p292 4686.html To unsubscribe from search by url in Solr?, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=2924632code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXwyOTI0NjMyfC0yMDk4MzQ0MTk2. -- Kumar Anurag - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/search-by-url-in-Solr-tp2924632p2926107.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: stopwords not working in multicore setup
Hi, you must encode the umlaut in the URL. In your case it must be q=title:f%FCr then it must be work. Von: Christopher Bottaro [mailto:cjbott...@onespot.com] Gesendet: Freitag, 25. März 2011 18:48 An: solr-user@lucene.apache.org Cc: Martin Rödig Betreff: Re: stopwords not working in multicore setup Ahh, thank you for the hints Martin... German stopwords without Umlaut work correctly. So I'm trying to figure out where the UTF-8 chars are getting messed up. Using the Solr admin web UI, I did a search for title:für and the xml (or json) output in the browser shows the query with the proper encoding, but the Solr logs show this: INFO: [page_30d_de] webapp=/solr path=/select params={explainOther=fl=*,scoreindent=onstart=0q=title:f?rhl.fl=qt=standardwt=xmlfq=version=2.2rows=10} hits=76 status=0 QTime=2 Notice the title:f?r. How do I fix that? I'm using Jetty btw... Thanks for the help. On Fri, Mar 25, 2011 at 3:05 AM, Martin Rödig r...@shi-gmbh.commailto:r...@shi-gmbh.com wrote: I have some questions about your config: Is the stopwords-de.txt in the same diractory as the shema.xml? Is the title field from type text? Have you the same problem with german stopwords with out Umlaut (ü,ö,ä) like the word denn? A Problem can be that the stopwords-de.txt is not save as UTF-8, so the filter can not read the umlaut ü in the file. Mit freundlichen Grüßen M.Sc. Dipl.-Inf. (FH) Martin Rödig SHI Elektronische Medien GmbH - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - AKTUELL - NEU - AB SOFORT Solr/Lucene Schulung vom 19. - 21. April in Berlin Als erster zertifizierter Trainingspartner von Lucid Imagination in Deutschland, Österreich und Schweiz bietet SHI ab sofort deutschsprachige Solr Schulungen an. Weitere Informationen: www.shi-gmbh.com/services/solr-traininghttp://www.shi-gmbh.com/services/solr-training Achtung: Die Anzahl der Plätze ist beschränkt! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Postadresse: Watzmannstr. 23, 86316 Friedberg Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg Tel.: 0821 7482633 18 Tel.: 0821 7482633 0 (Zentrale) Fax: 0821 7482633 29 Internet: http://www.shi-gmbh.com Registergericht Augsburg HRB 17382 Geschäftsführer: Peter Spiske Steuernummer: 103/137/30412 -Ursprüngliche Nachricht- Von: Christopher Bottaro [mailto:cjbott...@onespot.commailto:cjbott...@onespot.com] Gesendet: Freitag, 25. März 2011 05:37 An: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Betreff: stopwords not working in multicore setup Hello, I'm running a Solr server with 5 cores. Three are for English content and two are for German content. The default stopwords setup works fine for the English cores, but the German stopwords aren't working. The German stopwords file is stopwords-de.txt and resides in the same directory as stopwords.txt. The German cores use a different schema (named schema.page.de.xml) which has the following text field definition: http://pastie.org/1711866 The stopwords-de.txt file looks like this: http://pastie.org/1711869 The query I'm doing is this: q = title:für And it's returning documents with für in the title. Title is a text field which should use the stopwords-de.txt, as seen in the aforementioned pastie. Any ideas? Thanks for the help.
AW: Newbie wants to index XML content.
You can use the DIH (Dataimport Import Handler) to split up and index that XML. http://wiki.apache.org/solr/DataImportHandler Mit freundlichen Grüßen M.Sc. Dipl.-Inf. (FH) Martin Rödig SHI Elektronische Medien GmbH - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - AKTUELL - NEU - AB SOFORT Solr/Lucene Schulung vom 19. - 21. April in Berlin Als erster zertifizierter Trainingspartner von Lucid Imagination in Deutschland, Österreich und Schweiz bietet SHI ab sofort deutschsprachige Solr Schulungen an. Weitere Informationen: www.shi-gmbh.com/services/solr-training Achtung: Die Anzahl der Plätze ist beschränkt! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Postadresse: Watzmannstr. 23, 86316 Friedberg Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg Tel.: 0821 7482633 18 Tel.: 0821 7482633 0 (Zentrale) Fax: 0821 7482633 29 Internet: http://www.shi-gmbh.com Registergericht Augsburg HRB 17382 Geschäftsführer: Peter Spiske Steuernummer: 103/137/30412 -Ursprüngliche Nachricht- Von: Marcelo Iturbe [mailto:marc...@santiago.cl] Gesendet: Donnerstag, 24. März 2011 21:55 An: solr-user@lucene.apache.org Betreff: Newbie wants to index XML content. Hello, I've been reading up on how to index XML content but have a few questions. How is data in element attributes handled or defined? How are nested elements handled? In the following XML structure, I want to index the content of what is between the entry tags. In one XML document, there can be up to 100 entry tags. So the entry tag would be equivalent to the doc tag... Can I somehow index this XML as is or will I have to parse it, creating the doc tag and placing all the elements on the same level? Thanks for your help. ?xml version=1.0 encoding=utf-8? root sourcemanual/source author nameMC Anon User/name emailmca...@mcdomain.com/email /author entry name fullnameJohn Smith/fullname /name emailjsmit...@gmail.com/email /entry entry name fullnameFirst Last/fullname firstnameFirst/firstname lastnameLast/lastname /name organization nameMC S.A./name tittleCIO/tittle /organization email type=work primary=truefi...@mcdomain.com/email emailflas...@yahoo.com/email phoneNumber type=work primary=true+5629460600/phoneNumber im carrier=gtalk primary=truefi...@mcdomain.com/im im carrier=skypeFirst.Last/im postalAddress111 Bude St, Toronto/postalAddress custom name=bloghttp://blog.mcdomain.com//custom /entry /root regards Marcelo WebRep Overall rating
AW: stopwords not working in multicore setup
I have some questions about your config: Is the stopwords-de.txt in the same diractory as the shema.xml? Is the title field from type text? Have you the same problem with german stopwords with out Umlaut (ü,ö,ä) like the word denn? A Problem can be that the stopwords-de.txt is not save as UTF-8, so the filter can not read the umlaut ü in the file. Mit freundlichen Grüßen M.Sc. Dipl.-Inf. (FH) Martin Rödig SHI Elektronische Medien GmbH - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - AKTUELL - NEU - AB SOFORT Solr/Lucene Schulung vom 19. - 21. April in Berlin Als erster zertifizierter Trainingspartner von Lucid Imagination in Deutschland, Österreich und Schweiz bietet SHI ab sofort deutschsprachige Solr Schulungen an. Weitere Informationen: www.shi-gmbh.com/services/solr-training Achtung: Die Anzahl der Plätze ist beschränkt! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Postadresse: Watzmannstr. 23, 86316 Friedberg Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg Tel.: 0821 7482633 18 Tel.: 0821 7482633 0 (Zentrale) Fax: 0821 7482633 29 Internet: http://www.shi-gmbh.com Registergericht Augsburg HRB 17382 Geschäftsführer: Peter Spiske Steuernummer: 103/137/30412 -Ursprüngliche Nachricht- Von: Christopher Bottaro [mailto:cjbott...@onespot.com] Gesendet: Freitag, 25. März 2011 05:37 An: solr-user@lucene.apache.org Betreff: stopwords not working in multicore setup Hello, I'm running a Solr server with 5 cores. Three are for English content and two are for German content. The default stopwords setup works fine for the English cores, but the German stopwords aren't working. The German stopwords file is stopwords-de.txt and resides in the same directory as stopwords.txt. The German cores use a different schema (named schema.page.de.xml) which has the following text field definition: http://pastie.org/1711866 The stopwords-de.txt file looks like this: http://pastie.org/1711869 The query I'm doing is this: q = title:für And it's returning documents with für in the title. Title is a text field which should use the stopwords-de.txt, as seen in the aforementioned pastie. Any ideas? Thanks for the help.
Display analyzed values in hitlist
Hi, I want to display author names in the hitlist. For the author metafield I created a new Fieldtype type_author wich includes a synonymlist. In the Synonymlist all posible names of a person are reduced to one name. Example: d...@hh.demailto:d...@hh.de, dietmar, brock, dietmar brock, db = Dietmar Brock So far everything works fine as expected and in the facet auf the Field author displays only Dietmar Brock. But the hitlist still displays the orginal values like db or dietmar... I known that Dietmar Brock only was stored in the Index and that the orginal String that comes in can be stored wich stored=true, but I need a posibility to display Dietmar Brock at the Hitlist. Is there a posibility to include regexp and synonymlists in the requesthandler or to stored the analyzed value? What is the best solution to do it? Must I write a own Requesthandler? Thanks Martin
Display analyzed values in hitlist
Hi, I want to display author names in the hitlist. For the author metafield I created a new Fieldtype type_author wich includes a synonymlist. In the Synonymlist all posible names of a person are reduced to one name. Example: d...@hh.demailto:d...@hh.de, dietmar, brock, dietmar brock, db = Dietmar Brock So far everything works fine as expected and in the facet auf the Field author displays only Dietmar Brock. But the hitlist still displays the orginal values like db or dietmar... I known that Dietmar Brock only was stored in the Index and that the orginal String that comes in can be stored wich stored=true, but I need a posibility to display Dietmar Brock at the Hitlist. Is there a posibility to include regexp and synonymlists in the requesthandler or to stored the analyzed value? What is the best solution to do it? Must I write a own Requesthandler? Thanks Martin