RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards

2014-03-13 Thread Andreas Owen
I have gotten nearly everything to work. There are to queries where i dont get 
back what i want.

avaloq frage 1- only returns if i set minGramSize=1 while 
indexing
yh_cug- query parser doesn't remove _ but the 
indexer does (WDF) so there is no match

Is there a way to also query the hole term avaloq frage 1 without tokenizing 
it?

Fieldtype:

fieldType name=text_de class=solr.TextField positionIncrementGap=100
  analyzer type=index 
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory 
types=at-under-alpha.txt/ 
filter class=solr.StopFilterFactory ignoreCase=true 
words=lang/stopwords_de.txt format=snowball 
enablePositionIncrements=true/ !-- remove common words --
 filter class=solr.GermanNormalizationFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=German/ !-- remove noun/adjective inflections like plural endings 
-- 
filter class=solr.NGramFilterFactory minGramSize=3 
maxGramSize=15/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   /analyzer
   analyzer type=query
tokenizer class=solr.WhiteSpaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory 
types=at-under-alpha.txt/ 
filter class=solr.StopFilterFactory 
ignoreCase=true words=lang/stopwords_de.txt format=snowball 
enablePositionIncrements=true/ !-- remove common words --
filter class=solr.GermanNormalizationFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=German/
  /analyzer
 /fieldType


-Original Message-
From: Andreas Owen [mailto:a...@conx.ch] 
Sent: Mittwoch, 12. März 2014 18:39
To: solr-user@lucene.apache.org
Subject: RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 
3 uppwards

Hi Jack,

do you know how i can use local parameters in my solrconfig? The params are 
visible in the debugquery-output but solr doesn't parse them.

lst name=invariants
str name=fq{!q.op=OR} (*:* -organisations:[ TO *] -roles:[ TO 
*]) (+organisations:($org) +roles:($r)) (-organisations:[ TO *] +roles:($r)) 
(+organisations:($org) -roles:[ TO *])/str /lst


-Original Message-
From: Andreas Owen [mailto:a...@conx.ch]
Sent: Mittwoch, 12. März 2014 14:44
To: solr-user@lucene.apache.org
Subject: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 
uppwards

yes that is exactly what happend in the analyzer. the term i searched for was 
listed on both sides (index  query).

here's the rest:

analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is left to
 allow for accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer

-Original-Nachricht- 
 Von: Jack Krupansky j...@basetechnology.com
 An: solr-user@lucene.apache.org
 Datum: 12/03/2014 13:25
 Betreff: Re: NOT SOLVED searches for single char tokens instead of 
 from 3 uppwards
 
 You didn't show the new index analyzer - it's tricky to assure that 
 index and query are compatible, but the Admin UI Analysis page can help.
 
 Generally, using pure defaults for WDF is not what you want, 
 especially for query time. Usually there needs to be a slight 
 asymmetry between index and query for WDF - index generates more terms than 
 query.
 
 -- Jack Krupansky
 
 -Original Message-
 From: Andreas Owen
 Sent: Wednesday, March 12, 2014 6:20 AM
 To: solr-user@lucene.apache.org
 Subject: RE: NOT SOLVED searches for single char tokens instead of 
 from 3 uppwards
 
 I now have the following:
 
 analyzer type=query
 tokenizer class=solr.WhiteSpaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory 
 types=at-under-alpha.txt/ filter
 class

RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards

2014-03-12 Thread Andreas Owen
Hi Jack,

do you know how i can use local parameters in my solrconfig? The params are 
visible in the debugquery-output but solr doesn't parse them.

lst name=invariants
str name=fq{!q.op=OR} (*:* -organisations:[ TO *] -roles:[ TO 
*]) (+organisations:($org) +roles:($r)) (-organisations:[ TO *] +roles:($r)) 
(+organisations:($org) -roles:[ TO *])/str
/lst


-Original Message-
From: Andreas Owen [mailto:a...@conx.ch] 
Sent: Mittwoch, 12. März 2014 14:44
To: solr-user@lucene.apache.org
Subject: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 
uppwards

yes that is exactly what happend in the analyzer. the term i searched for was 
listed on both sides (index  query).

here's the rest:

analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is left to
 allow for accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer

-Original-Nachricht- 
 Von: Jack Krupansky j...@basetechnology.com
 An: solr-user@lucene.apache.org
 Datum: 12/03/2014 13:25
 Betreff: Re: NOT SOLVED searches for single char tokens instead of 
 from 3 uppwards
 
 You didn't show the new index analyzer - it's tricky to assure that 
 index and query are compatible, but the Admin UI Analysis page can help.
 
 Generally, using pure defaults for WDF is not what you want, 
 especially for query time. Usually there needs to be a slight 
 asymmetry between index and query for WDF - index generates more terms than 
 query.
 
 -- Jack Krupansky
 
 -Original Message-
 From: Andreas Owen
 Sent: Wednesday, March 12, 2014 6:20 AM
 To: solr-user@lucene.apache.org
 Subject: RE: NOT SOLVED searches for single char tokens instead of 
 from 3 uppwards
 
 I now have the following:
 
 analyzer type=query
 tokenizer class=solr.WhiteSpaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory 
 types=at-under-alpha.txt/ filter 
 class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=lang/stopwords_de.txt format=snowball 
 enablePositionIncrements=true/ !-- remove common words -- filter 
 class=solr.GermanNormalizationFilterFactory/
 filter class=solr.SnowballPorterFilterFactory language=German/
   /analyzer
 
 The gui analysis shows me that wdf doesn't cut the underscore anymore 
 but it still returns 0 results?
 
 Output:
 
 lst name=debug
   str name=rawquerystringyh_cug/str
   str name=querystringyh_cug/str
   str name=parsedquery(+DisjunctionMaxQuery((tags:yh_cug^10.0 |
 links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 |
 url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 |
 breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0 
 |
 editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0))
 ((expiration:[1394619501862 TO *]
 (+MatchAllDocsQuery(*:*) -expiration:*))^6.0) 
 FunctionQuery((div(int(clicks),max(int(displays),const(1^8.0))/no_
 coord/str
   str name=parsedquery_toString+(tags:yh_cug^10.0 | 
 links:yh_cug^5.0 |
 thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 |
 h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 |
 contentmanager:yh_cug^5.0 | title:yh_cug^20.0 | 
 editorschoice:yh_cug^200.0 |
 doctype:yh_cug^10.0) ((expiration:[1394619501862 TO *]
 (+*:* -expiration:*))^6.0)
 (div(int(clicks),max(int(displays),const(1^8.0/str
   lst name=explain/
   arr name=expandedSynonyms
 stryh_cug/str
   /arr
   lst name=reasonForNotExpandingSynonyms
 str name=nameDidntFindAnySynonyms/str
 str name=explanationNo synonyms found for this query.  Check 
 your synonyms file./str
   /lst
   lst name=mainQueryParser
 str name=QParserExtendedDismaxQParser/str
 null name=altquerystring/
 arr name=boost_queries
   str(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str
 /arr
 arr name=parsed_boost_queries
   str(expiration:[1394619501862 TO *]
 (+MatchAllDocsQuery(*:*) -expiration:*))^6.0/str
 /arr
 arr name=boostfuncs
   strdiv(clicks,max(displays,1))^8/str
 /arr
   /lst
   lst name=synonymQueryParser
 str name=QParserExtendedDismaxQParser/str
 null name=altquerystring/
 arr name=boostfuncs