AW: Structured fields and termVectors

2011-05-17 Thread Martin Rödig
Hello,

I think you can use a field that is stored and indexed. On the index time you 
can use a Keywordtokanizer und a filter to reduce the Path (without M). The 
value to display (stored Field) is always the orginal value, that means the 
value that comes in. The value was stored before the tokanizer and filter are 
working. The indexed values (and Termvectors) are the terms after the tokanizer 
and filters, so you have the reduce Path in the Termvectors.  
I hope this can Help you.

Mit freundlichen Grüßen
M.Sc. Dipl.-Inf. (FH) Martin Rödig
 
SHI Elektronische Medien GmbH
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
AKTUELL - NEU - AB SOFORT 
Solr/Lucene Schulung vom 28. - 30. Juni in Zürich
Wien 19. - 21.07.2011 | München 27. - 29.09. und 15. - 17.11.2011

Als erster zertifizierter Trainingspartner von Lucid Imagination in 
Deutschland, Österreich und Schweiz bietet SHI ab sofort 
deutschsprachige Solr Schulungen an.
Weitere Informationen: www.shi-gmbh.com/services/solr-training
Achtung: Die Anzahl der Plätze ist beschränkt!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
Postadresse: Watzmannstr. 23, 86316 Friedberg
Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg

Internet: http://www.shi-gmbh.com
Registergericht Augsburg HRB 17382
Geschäftsführer: Peter Spiske
Steuernummer: 103/137/30412

-Ursprüngliche Nachricht-
Von: Jack Repenning [mailto:jrepenn...@collab.net] 
Gesendet: Dienstag, 17. Mai 2011 03:30
An: solr-user@lucene.apache.org
Betreff: Structured fields and termVectors

How does MoreLikeThis use termVectors?

My documents (full sample at the bottom) frequently include lines more or less 
like this

   M /trunk/home/.Aquamacs/Preferences.el

I want to MoreLikeThis based on the full path, but not the M. But what I 
actually display as a search result should include M (should look pretty much 
like the sample, below).

If I define a field to include that whole line, I can certainly search in ways 
that skip the M, but how do I control the termVector and MoreLikeThis?  I 
think the answer is not to termVector the line as shown, but rather to index 
these lines twice, once whole (which is also copyFielded into the display 
text), and a second time with just the path (and termVectors=true). Which is 
OK, but since these lines will represent most of my data, double-indexing seems 
to double my storage, which is ... oh, well ... not entirely optimal.

So is there some way I can index the full line, once, with M and path, and 
tell the termVector to include the whole path and nothing but the path?



-==-
Jack Repenning
Technologist
Codesion Business Unit
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
twitter: http://twitter.com/jrep




r3580 | jack | 2011-04-26 13:55:46 -0700 (Tue, 26 Apr 2011) | 1 line Changed 
paths:
   M /trunk/home/.Aquamacs
   M /trunk/home/.Aquamacs/Preferences.el
   M /trunk/www/wynton-start-page.html

simplify the hijack of Aquamacs prefs storage, aufl




AW: search by url in Solr?

2011-05-11 Thread Martin Rödig
Hello,

if you want to have more than one searchfield, you can use the Dismax or 
eDismax Queryparser. There you can set more than one field to search.
Example:
requestHandler name=standard class=solr.SearchHandler default=true
lst name=defaults
str name=defTypedismax/str
str name=echoParamsexplicit/str
str name=fl*,score/str
str name=qftext^1.0 title^0.05 author^0.2 shi_quelle^0.4 
shi_year adrp_keywords^0.5 shi_path^2.0/str
...

Qf are the Query fields in wich the parser will search.
The numbers with the ^ are boost Factors.

The Dismax have many more nice features look at it on 
http://wiki.apache.org/solr/DisMaxQParserPlugin

Mit freundlichen Grüßen
M.Sc. Dipl.-Inf. (FH) Martin Rödig
 
SHI Elektronische Medien GmbH

Postadresse: Watzmannstr. 23, 86316 Friedberg
Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg
Tel.: 0821 7482633 18
Tel.: 0821 7482633 0 (Zentrale)
Fax: 0821 7482633 29

Internet: http://www.shi-gmbh.com
Registergericht Augsburg HRB 17382
Geschäftsführer: Peter Spiske
Steuernummer: 103/137/30412

-Ursprüngliche Nachricht-
Von: Anurag [mailto:anurag.it.jo...@gmail.com] 
Gesendet: Mittwoch, 11. Mai 2011 07:05
An: solr-user@lucene.apache.org
Betreff: Re: search by url in Solr?

thanks! , it worked.
Can i mention
 defaultSearchFieldurl,content/
defaultSearchField
to inlcude two default fields.?

On Wed, May 11, 2011 at 3:02 AM, Rakudten [via Lucene] 
ml-node+2924686-576776982-146...@n3.nabble.com wrote:

 Hello.

 One option is to specify a default search field in your schema.xml. If 
 your

 query doesn´t include an specific field the query parser will use the 
 default one to launch the query. You should include in your schema.xml 
 something like this:


 !-- field for the QueryParser to use when an explicit fieldname 
 is absent --

  defaultSearchFieldurl/defaultSearchField


 --
  If you reply to this email, your message will be added to the 
 discussion
 below:

 http://lucene.472066.n3.nabble.com/search-by-url-in-Solr-tp2924632p292
 4686.html  To unsubscribe from search by url in Solr?, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=2924632code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXwyOTI0NjMyfC0yMDk4MzQ0MTk2.





--
Kumar Anurag


-
Kumar Anurag

--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-by-url-in-Solr-tp2924632p2926107.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: stopwords not working in multicore setup

2011-03-28 Thread Martin Rödig
Hi,

you must encode the umlaut in the URL. In your case it must be q=title:f%FCr 
then it must be work.




Von: Christopher Bottaro [mailto:cjbott...@onespot.com]
Gesendet: Freitag, 25. März 2011 18:48
An: solr-user@lucene.apache.org
Cc: Martin Rödig
Betreff: Re: stopwords not working in multicore setup

Ahh, thank you for the hints Martin... German stopwords without Umlaut work 
correctly.

So I'm trying to figure out where the UTF-8 chars are getting messed up.  Using 
the Solr admin web UI, I did a search for title:für and the xml (or json) 
output in the browser shows the query with the proper encoding, but the Solr 
logs show this:

INFO: [page_30d_de] webapp=/solr path=/select 
params={explainOther=fl=*,scoreindent=onstart=0q=title:f?rhl.fl=qt=standardwt=xmlfq=version=2.2rows=10}
 hits=76 status=0 QTime=2

Notice the title:f?r.  How do I fix that?  I'm using Jetty btw...

Thanks for the help.

On Fri, Mar 25, 2011 at 3:05 AM, Martin Rödig 
r...@shi-gmbh.commailto:r...@shi-gmbh.com wrote:
I have some questions about your config:

Is the stopwords-de.txt in the same diractory as the shema.xml?
Is the title field from type text?
Have you the same problem with german stopwords with out Umlaut (ü,ö,ä) like 
the word denn?

A Problem can be that the stopwords-de.txt is not save as UTF-8, so the filter 
can not read the umlaut ü in the file.


Mit freundlichen Grüßen
M.Sc. Dipl.-Inf. (FH) Martin Rödig

SHI Elektronische Medien GmbH
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
AKTUELL - NEU - AB SOFORT
Solr/Lucene Schulung vom 19. - 21. April in Berlin

Als erster zertifizierter Trainingspartner von Lucid Imagination in
Deutschland, Österreich und Schweiz bietet SHI ab sofort
deutschsprachige Solr Schulungen an.
Weitere Informationen: 
www.shi-gmbh.com/services/solr-traininghttp://www.shi-gmbh.com/services/solr-training
Achtung: Die Anzahl der Plätze ist beschränkt!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
Postadresse: Watzmannstr. 23, 86316 Friedberg
Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg
Tel.: 0821 7482633 18
Tel.: 0821 7482633 0 (Zentrale)
Fax: 0821 7482633 29

Internet: http://www.shi-gmbh.com
Registergericht Augsburg HRB 17382
Geschäftsführer: Peter Spiske
Steuernummer: 103/137/30412

-Ursprüngliche Nachricht-
Von: Christopher Bottaro 
[mailto:cjbott...@onespot.commailto:cjbott...@onespot.com]
Gesendet: Freitag, 25. März 2011 05:37
An: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Betreff: stopwords not working in multicore setup

Hello,

I'm running a Solr server with 5 cores.  Three are for English content and two 
are for German content.  The default stopwords setup works fine for the English 
cores, but the German stopwords aren't working.

The German stopwords file is stopwords-de.txt and resides in the same directory 
as stopwords.txt.  The German cores use a different schema (named
schema.page.de.xml) which has the following text field definition:
http://pastie.org/1711866

The stopwords-de.txt file looks like this:  http://pastie.org/1711869

The query I'm doing is this:  q = title:für

And it's returning documents with für in the title.  Title is a text field 
which should use the stopwords-de.txt, as seen in the aforementioned pastie.

Any ideas?  Thanks for the help.



AW: Newbie wants to index XML content.

2011-03-25 Thread Martin Rödig
You can use the DIH (Dataimport Import Handler) to split up and index that XML.
 http://wiki.apache.org/solr/DataImportHandler


Mit freundlichen Grüßen
M.Sc. Dipl.-Inf. (FH) Martin Rödig
 
SHI Elektronische Medien GmbH
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
AKTUELL - NEU - AB SOFORT 
Solr/Lucene Schulung vom 19. - 21. April in Berlin
 
Als erster zertifizierter Trainingspartner von Lucid Imagination in 
Deutschland, Österreich und Schweiz bietet SHI ab sofort 
deutschsprachige Solr Schulungen an.
Weitere Informationen: www.shi-gmbh.com/services/solr-training
Achtung: Die Anzahl der Plätze ist beschränkt!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
Postadresse: Watzmannstr. 23, 86316 Friedberg
Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg
Tel.: 0821 7482633 18
Tel.: 0821 7482633 0 (Zentrale)
Fax: 0821 7482633 29

Internet: http://www.shi-gmbh.com
Registergericht Augsburg HRB 17382
Geschäftsführer: Peter Spiske
Steuernummer: 103/137/30412

-Ursprüngliche Nachricht-
Von: Marcelo Iturbe [mailto:marc...@santiago.cl] 
Gesendet: Donnerstag, 24. März 2011 21:55
An: solr-user@lucene.apache.org
Betreff: Newbie wants to index XML content.

Hello,
I've been reading up on how to index XML content but have a few questions.

How is data in element attributes handled or defined? How are nested elements 
handled?

In the following XML structure, I want to index the content of what is between 
the entry tags.
In one XML document, there can be up to 100 entry tags.
So the entry tag would be equivalent to the doc tag...

Can I somehow index this XML as is or will I have to parse it, creating the 
doc tag and placing all the elements on the same level?

Thanks for your help.

?xml version=1.0 encoding=utf-8?
root
sourcemanual/source
author
nameMC Anon User/name
emailmca...@mcdomain.com/email
/author

entry
name
fullnameJohn Smith/fullname
/name
emailjsmit...@gmail.com/email
/entry

entry
name
fullnameFirst Last/fullname
firstnameFirst/firstname
lastnameLast/lastname
/name
organization
nameMC S.A./name
tittleCIO/tittle
/organization
email type=work primary=truefi...@mcdomain.com/email
emailflas...@yahoo.com/email
phoneNumber type=work primary=true+5629460600/phoneNumber
im carrier=gtalk primary=truefi...@mcdomain.com/im
im carrier=skypeFirst.Last/im
postalAddress111 Bude St, Toronto/postalAddress
custom name=bloghttp://blog.mcdomain.com//custom
/entry
/root

regards
Marcelo
WebRep
Overall rating


AW: stopwords not working in multicore setup

2011-03-25 Thread Martin Rödig
I have some questions about your config: 

Is the stopwords-de.txt in the same diractory as the shema.xml?
Is the title field from type text?
Have you the same problem with german stopwords with out Umlaut (ü,ö,ä) like 
the word denn? 

A Problem can be that the stopwords-de.txt is not save as UTF-8, so the filter 
can not read the umlaut ü in the file.


Mit freundlichen Grüßen
M.Sc. Dipl.-Inf. (FH) Martin Rödig
 
SHI Elektronische Medien GmbH
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
AKTUELL - NEU - AB SOFORT 
Solr/Lucene Schulung vom 19. - 21. April in Berlin
 
Als erster zertifizierter Trainingspartner von Lucid Imagination in 
Deutschland, Österreich und Schweiz bietet SHI ab sofort 
deutschsprachige Solr Schulungen an.
Weitere Informationen: www.shi-gmbh.com/services/solr-training
Achtung: Die Anzahl der Plätze ist beschränkt!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
Postadresse: Watzmannstr. 23, 86316 Friedberg
Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg
Tel.: 0821 7482633 18
Tel.: 0821 7482633 0 (Zentrale)
Fax: 0821 7482633 29

Internet: http://www.shi-gmbh.com
Registergericht Augsburg HRB 17382
Geschäftsführer: Peter Spiske
Steuernummer: 103/137/30412

-Ursprüngliche Nachricht-
Von: Christopher Bottaro [mailto:cjbott...@onespot.com] 
Gesendet: Freitag, 25. März 2011 05:37
An: solr-user@lucene.apache.org
Betreff: stopwords not working in multicore setup

Hello,

I'm running a Solr server with 5 cores.  Three are for English content and two 
are for German content.  The default stopwords setup works fine for the English 
cores, but the German stopwords aren't working.

The German stopwords file is stopwords-de.txt and resides in the same directory 
as stopwords.txt.  The German cores use a different schema (named
schema.page.de.xml) which has the following text field definition:
http://pastie.org/1711866

The stopwords-de.txt file looks like this:  http://pastie.org/1711869

The query I'm doing is this:  q = title:für

And it's returning documents with für in the title.  Title is a text field 
which should use the stopwords-de.txt, as seen in the aforementioned pastie.

Any ideas?  Thanks for the help.


Display analyzed values in hitlist

2011-01-26 Thread Martin Rödig
Hi,
I want to display author names in the hitlist. For the author metafield I 
created a new Fieldtype type_author wich includes a synonymlist. In the 
Synonymlist all posible names of a person are reduced to one name.
Example:
d...@hh.demailto:d...@hh.de, dietmar, brock, dietmar brock, db = Dietmar 
Brock

So far everything works fine as expected and in the facet auf the Field author 
displays only Dietmar Brock. But the hitlist still displays the orginal values 
like db or dietmar...
I known that Dietmar Brock only was stored in the Index and that the orginal 
String that comes in can be stored wich stored=true, but I need a posibility 
to display Dietmar Brock at the Hitlist.

Is there a posibility to include regexp and synonymlists in the requesthandler 
or to stored the analyzed value? What is the best solution to do it?
Must I write a own Requesthandler?

Thanks
Martin


Display analyzed values in hitlist

2011-01-26 Thread Martin Rödig
Hi,
I want to display author names in the hitlist. For the author metafield I 
created a new Fieldtype type_author wich includes a synonymlist. In the 
Synonymlist all posible names of a person are reduced to one name.
Example:
d...@hh.demailto:d...@hh.de, dietmar, brock, dietmar brock, db = Dietmar 
Brock

So far everything works fine as expected and in the facet auf the Field author 
displays only Dietmar Brock. But the hitlist still displays the orginal values 
like db or dietmar...
I known that Dietmar Brock only was stored in the Index and that the orginal 
String that comes in can be stored wich stored=true, but I need a posibility 
to display Dietmar Brock at the Hitlist.

Is there a posibility to include regexp and synonymlists in the requesthandler 
or to stored the analyzed value? What is the best solution to do it?
Must I write a own Requesthandler?

Thanks
Martin