most stable way to get facet pivoting

2011-04-15 Thread Nikolas Tautenhahn
Hi,

I want to evaluate (and probably use in production) facet pivoting -
what is the best approach to get a as-stable-as-can-be version of solr
which is able to do facet pivoting? I was hoping to see this in Solr
3.1, but apparently it is only in the dev versions/nightlies...

Is it possible to patch this feature into Solr 3.1 stable?

best regards,
Nik

-- 
Nikolas Tautenhahn

nikolas.tautenh...@livinglogic.de

http://www.livinglogic.de

LivingLogic AG
Markgrafenallee 44
95448 Bayreuth
Amtsgericht Bayreuth ++ HRB 3274
Aufsichtsratsvorsitzender: Achim Lindner
Vorstand: Philipp Ambrosch, Alois Kastner-Maresch (Vors.)


most stable way to get facet pivoting

2011-04-14 Thread Nikolas Tautenhahn
Hi,

I want to evaluate (and probably use in production) facet pivoting -
what is the best approach to get a as-stable-as-can-be version of solr
which is able to do facet pivoting? I was hoping to see this in Solr
3.1, but apparently it is only in the dev versions/nightlies...

Is it possible to patch this feature into Solr 3.1 stable?

best regards,
Nik



Re: Proper Escaping of Ampersands

2010-08-24 Thread Nikolas Tautenhahn
Hi Chris,

On 23.08.2010 21:37, Chris Hostetter wrote:
 : The document is indexed correctly, a search for at s found it and all
 : fields looked great (ats and not for example, atamp;s).
 : 
 : As my stopword list does not contain at or  or amp;, I don't
 : quite understand, why my result is found, when I disable the
 : stopword-list. My stopwordlist can be found here
 : 
 : http://pastebin.com/RfLuBHqd
 : 
 : Do you happen to see bad things for a string like ats here?
 
 s is in your stopwords file, which may be part of the problem (but i 
 didn't look hard at your query string to verify that)
 
 : The analysis page in the admin panel tells me, these steps for the Index
 : Analyzer:
   ...
 : (StopFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: ats
 : 
 : So, according to this, it should be found even with my stopwords enabled...
 
 Strange, based on the stopwords file you posted the s should definitely 
 be getting removed at index time -- it would also get removed at query 
 time, but because you have it *before* WDF at query time that wouldn't 
 affect this query (even though it did affect the index)
 
 There was a bug with analysis.jsp and stopwords recently, but that 
 shouldn't have affected 1.4 (you are definitely using 1.4, correct?)
 
 https://issues.apache.org/jira/browse/SOLR-2051

I am using solr 1.4 (actually LucidWorks Solr) in production and tried
1.4.1 for testing - unfortunately I can't tell for sure, if I tried the
analysis.jsp in both...

I moved the stopword filter before the WordDelimiterFilter - thanks for
your hints, Chris and Yonik!

best regards,
Nikolas Tautenhahn


Re: Proper Escaping of Ampersands

2010-08-23 Thread Nikolas Tautenhahn
Hi Yonik,

I got it working, but I think the Stopword Filter is not behaving as
expected - (The document could be found when I disabled the stopword
filter, details later in this mail...)

On 20.08.2010 16:57, Yonik Seeley wrote
 On Thu, Aug 19, 2010 at 11:33 AM, Nikolas Tautenhahn
 nik_s...@livinglogic.de wrote:
 But when I search for q=at%26s (=ats), I get nothing.
 
 That's the correct encoding if you're typing it directly into a
 browser address box.
 http://localhost:8983/solr/select?defType=dismaxqf=textq=at%26sdebugQuery=true
 
 But you should be able to verify that solr is getting the correct
 query string by checking out params in the response (in the example
 server, by default they are echoed back).  And adding debugQuery=true
 to the request should show you exactly what query is being generated.
 
 But the real issue likely lies with your fieldType definition.  Can
 you show that?

As I (normally) query multiple fields, I changed my request URL to
http://127.0.0.1:8983/solr/select?q=at%26sfl=titelqt=dismaxqf=titeldebugQuery=truefl=*qt=dismaxqf=titeldebugQuery=true
in order to narrow it down and got this response (cut to, as I think,
relevant stuff)

 str name=rawquerystringats/str
 str name=querystringats/str
 str name=parsedquery+DisjunctionMaxQuery((titel:(ats at) s)~0.1) 
 ()/str
 str name=parsedquery_toString+(titel:(ats at) s)~0.1 ()/str
 lst name=explain/
 str name=QParserDisMaxQParser/str

on my local debugging instance, using standard dismax config (from the
examples directory at solr).

The titel-Field is configured like this:

   field name=titel type=textgen indexed=true stored=true/

and textgen is configured like this

 fieldType name=textgen class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.HTMLStripStandardTokenizerFactory/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=false/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
 splitOnCaseChange=0 preserveOriginal=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=0 preserveOriginal=1/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType

The document is indexed correctly, a search for at s found it and all
fields looked great (ats and not for example, atamp;s).

As my stopword list does not contain at or  or amp;, I don't
quite understand, why my result is found, when I disable the
stopword-list. My stopwordlist can be found here

http://pastebin.com/RfLuBHqd

Do you happen to see bad things for a string like ats here?

The analysis page in the admin panel tells me, these steps for the Index
Analyzer:

(HTMLStripStandardTokenizer) ats = ats
(SynonymFilter) ats = ats
(WordDelimiterFilter) ats = term position 1: ats, at; term pos 2: s, ats
(LowerCaseFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: s, ats
(StopFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: ats

So, according to this, it should be found even with my stopwords enabled...


best regards and thanks for your response,
Nikolas Tautenhahn


Re: Tokenising on Each Letter

2010-08-23 Thread Nikolas Tautenhahn
Hi Scottie,

 Could you elaborate about N gram for me, based on my schema?

just a quick reply:

 fieldType name=textNGram class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory 
 synonyms=index_synonyms.txt ignoreCase=true expand=false/ --
 
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=0 catenateWords=1 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1/
 filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EdgeNGramFilterFactory side=front 
 minGramSize=2 maxGramSize=30 /
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1/
   filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

Will produce any NGrams from 2 up to 30 Characters, for Info check
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory

Be sure to adjust those sizes (minGramSize/maxGramSize) so that
maxGramSize is big enough to keep the whole original serial number/model
number and minGramSize is not so small that you fill your index with
useless information.

Best regards,
Nikolas Tautenhahn




Re: Proper Escaping of Ampersands

2010-08-20 Thread Nikolas Tautenhahn
Hi all,

just some further information:
https://issues.apache.org/jira/browse/SOLR-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

seems to be the same problem - but searching the archives yielded
nothing I could use.

Any hints on this?

best regards,
Nikolas Tautenhahn

Am 19.08.2010 17:33, schrieb Nikolas Tautenhahn:
 Hi,
 
 I have a problem with, for example, company names like ATS.
 A Job is sending data to the solr 1.4 (also tested it with 1.4.1) index
 via python in XML, everything is escaped properly ( becomes amp;).
 
 When I search for at s(q=%22at%20s%22), using the dismax handler, I
 find the dataset to this company and I get all names back (The company
 is still called ats and not something like atamp;s).
 
 But when I search for q=at%26s (=ats), I get nothing.
 I also tried q=at%5C%26s (=at\s) and q=at%5C%5C%26s blindly following
 any clues for escaping with backslashes...
 
 
 So, my question is: How do I search (correctly) for ats?
 
 
 When I use the Analysis Page in the admin panel and select my
 fieldname and enter Field Value (Index) ATS and enter the Field Value
 (Query) as ATS it shows me that the query matches - so I assume, SOLR
 doesn't get the correct query string...
 
 If it is necessary, I can supply information from schema.xml for the
 fields in use, but as the Analysis-Page showed the match, I don't
 think this is very useful...
 
 best regards,
 Nikolas Tautenhahn
 




Proper Escaping of Ampersands

2010-08-19 Thread Nikolas Tautenhahn
Hi,

I have a problem with, for example, company names like ATS.
A Job is sending data to the solr 1.4 (also tested it with 1.4.1) index
via python in XML, everything is escaped properly ( becomes amp;).

When I search for at s(q=%22at%20s%22), using the dismax handler, I
find the dataset to this company and I get all names back (The company
is still called ats and not something like atamp;s).

But when I search for q=at%26s (=ats), I get nothing.
I also tried q=at%5C%26s (=at\s) and q=at%5C%5C%26s blindly following
any clues for escaping with backslashes...


So, my question is: How do I search (correctly) for ats?


When I use the Analysis Page in the admin panel and select my
fieldname and enter Field Value (Index) ATS and enter the Field Value
(Query) as ATS it shows me that the query matches - so I assume, SOLR
doesn't get the correct query string...

If it is necessary, I can supply information from schema.xml for the
fields in use, but as the Analysis-Page showed the match, I don't
think this is very useful...

best regards,
Nikolas Tautenhahn