from:"PeterKerk"

Thanks, will look into all that :-)



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: Solr search engine configuration

Cool, will do some more digging around in the analysis GUI first.

One last thing then on this comment of yours:
"Does the decompounder support emitting the compound word as well? If so,
enable it. It should help scoring compounds higher via IDF as they are less
common."

So I checked the Javadoc:
https://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html
To be sure I also checked the Javadoc for the alternative
:https://lucene.apache.org/core/6_5_0/analyzers-common/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html,
but nothing there on emitting either.

Where can I see whether DictionaryCompoundWordTokenFilterFactory supports
emitting the compound work and how to enable it?

Thanks again! :-)



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: Solr search engine configuration

You must stay in the Javadoc section, there the examples are good, or the
reference guide: 
https://lucene.apache.org/core/6_5_0/analyzers-common/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html
https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#filter-descriptions

PVK COMMENT 1: 
This seems to be for Solr 6.5+? I'm using 4.3.1. An upgrade is not on 
the
radar soon. Will using DictionaryCompoundWordTokenFilterFactory as I'm doing
now severely degrade my result quality as opposed to
HyphenationCompoundWordTokenFilterFactory?


Almost, zaken -> zaak is already KP output, no need to input what the
stemmer will do for you. 

PVK COMMENT 2: 
How do you know zaken -> zaak is already KP output? Is there a list
somewhere?

PVK COMMENT 3: 
I now have:


  




   



  






  
  




   


 
 

 
 


 
  


I tested in admin UI (and yes, I restart Solr and reindex every time I make
a change):  

http://localhost:8983/solr/tt-search-global/select?q=title_search_global%3A(dieren+zaak)=id%2Ctitle=xml=true
returns:
"hi there dieren zaak something else"
"hi there dier something else"

http://localhost:8983/solr/tt-search-global/select?q=title_search_global%3A(dierenzaak)=id%2Ctitle=xml=true=edismax=title_search_global=true=true
returns
"hi there dierenzaak something else"

So I added "dieren" to compounds_nl.txt

Now on "title_search_global:(dieren zaak)" it returns:

hi there dieren zaak something else
115_3699638


hi there dier something else
115_3699637


hi there dierenzaak something else
115_3699639


So it's starting to look good! :-)

What I want to know, how can I have Solr consider "dierenzaak" to be of
higher importance than just "dier" in the above results?

Also I'm still not 100% sure what my addition of "dieren" to
compounds_nl.txt actually does...I assume
DictionaryCompoundWordTokenFilterFactory just looks for that exact string
and if it finds it, considers that a separate word? Correct?

Thanks again!



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: Solr search engine configuration

Markus,

Thanks again. Ok, 1 by 1:

StemmerOverride wants \t separated fields, that is probably the cause of the
AIooBE you get. Regarding schema definitions, each factory JavaDoc [1] has a
proper example listed. I recommend putting a decompounder before a stemmer,
and have an accent (or ICU) folder as one of the last filters.

PVK COMMENT:
Looking for Decompounders and found a few links, btw a lot of the pages
these are linked to don't work.

https://earlydance.org/news/9189-apachesolr-issues-german-and-other-germanic-languages

http://lucene.apache.org/core/2_4_0/api/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html
https://wiki.apache.org/solr/LanguageAnalysis#Decompounding

https://wiki.apache.org/solr/DictionaryCompoundWordTokenFilterFactory

my stemdict_nl.txt now contains (words separated by a single tab):
aachen aach
aacheneraachener
aalmoezen aalmoes
beveel bevool
dierenzaken dierenzaak

The problem before was indeed like @Shawn indicates that I had words in
there with a space like so:
dieren zakendierenzaak

About the diff, it looks like KP output, it has the same issues with whether
or not a word needs double or single vowels in the root. It also shows
issues with strong verbs/nouns (beveel/bevool). Having this list seems like
having KP configured so you should drop it, and only list exceptions to KP
rules in the dict file. This is not easy, so i recommend to stay in to your
domain's vocabulary.

PVK COMMENT:
That's what I now did above right?

Also, unless you have a very specific need for it, drop the StopFilter.
Nobody in these days should want a StopFilter unless they can justify it. We
use them too, but only for very specific reasons, but never for text search.
You might also want to have a WordDelimiterFilter as your first filter, look
it up, you probably want to have it.

PVK COMMENT:
But without a Stopfilter, wont stopwords be included in searches? I though
that for example Google excluded these words in their algorithms?

This is what I have now:

Now for both this query
http://localhost:8983/solr/tt-search-global/select?q=title_search_global%3A(dieren+zaak)=id%2Ctitle=xml=true=edismax=true=true

and this one:
http://localhost:8983/solr/tt-search-global/select?q=title_search_global%3A(dieren+zaak)=id%2Ctitle=xml=true=edismax=title_search_global=true=true

This result is found:
"Hi there dieren zaak something else"

And these are NOT:
"Hi there dier something else"
"Hi there dierenzaak something else"
"Hi there dierzaak something else"

What else do you recommend I try?

--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: Solr search engine configuration

2018-03-12 Thread PeterKerk

@Erick: thank you for clarifying!

@Markus:
I feel like I'm not (or at least should not be :-)) the first person to run
into these challenges.

"You can solve this by adding manual rules to StemmerOverrideFilter, but due
to the compound nature of words, you would need to add it for all the mills"

After Googling I found this:
https://stackoverflow.com/questions/22451774/word-does-not-get-analysed-properly-using-stemmeroverridefilterfactory-and-snowb
and added http://snowball.tartarus.org/algorithms/kraaij_pohlmann/diffs.txt
as stemdict_nl.txt

My new fieldType definition now is:


  
   
   
   
  
  
  
  
   
   
 

  
  


I trimmed stemdict_nl.txt for testing to just this:

aachenaach
aachener  aachener

But on full-import it throws a http 500 error:
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1  at
org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilterFactory.inform(StemmerOverrideFilterFactory.java:66)

Is my stemdict_nl.txt format incorrect?

And do you have examples of the HyphenationCompoundWordTokenFilter or
AccentFoldingFilter I can't find any.

I use Solr 4.3.1 btw, not sure if that matters.




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr search engine configuration

2018-03-11 Thread PeterKerk

Sorry for this lengthy post, but I wanted to be complete.

The only occurence of edismax in solrconfig.xml is this one:


   

  edismax
  explicit
  10
 
  double_score
  false
  *:*



I don't have a requestHandler named "/select".


Also, removing the gramming definitely helped! :-)

I tried to simplify my setup first and then expand, so what I have now is
this:



  
   
   
 



  
  
   
   
 



  




In my database I have these 4 values for "title" that populate
"title_search_global"   

"Hi there dier something else"
"Hi there dieren zaak something else"
"Hi there dierenzaak something else"
"Hi there dierzaak something else"

ps. "dier" is singular of plural "dieren". 

Using this query:
http://localhost:8983/solr/search-global/select?q=title_search_global%3A(dieren+zaak)=(lang%3A%22nl%22+OR+lang%3A%22all%22)=id%2Ctitle=xml=true=edismax=title_search_global=true=true=true

These results are found:
"Hi there dier something else"
"Hi there dieren zaak something else"

And these are NOT:
"Hi there dierenzaak something else"
"Hi there dierzaak something else"

I'd expect it should be fairly easy (although I don't know how) to also
include result "dierenzaak", by compounding the 2 query values. And yes you
are correct: in Dutch "dieren zaak" would mean the same as "dierenzaak". Not
sure what logic would also include "dierzaak"

Regarding your question: yes, I do consider "dieren zaak soemthingelse" an
exact match of "dieren zaak"
So I also checked the usage of pf parameters with edismax (based on these
links:
https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html,
http://blog.thedigitalgroup.com/vijaym/understanding-phrasequery-and-slop-in-solr/)
And also for dismax:
https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Theqs_QueryPhraseSlop_Parameter

But I can't find any examples how to actually use these parameters? 


The search results, including debug info is here:




0
7

title_search_global:(dieren zaak)
edismax
true
true
title_search_global
id,title
(lang:"nl" OR lang:"all")
xml
true
true




dieren zaak
115_3699638


dier
115_3699637



title_search_global:(dieren zaak)
title_search_global:(dieren zaak)

(+(title_search_global:dier title_search_global:zaak))/no_coord


+(title_search_global:dier title_search_global:zaak)



5.489122 = (MATCH) sum of: 2.4387078 = (MATCH)
weight(title_search_global:dier in 51) [DefaultSimilarity], result of:
2.4387078 = score(doc=51,freq=1.0 = termFreq=1.0 ), product of: 0.66654336 =
queryWeight, product of: 5.8539815 = idf(docFreq=3, maxDocs=513) 0.113861546
= queryNorm 3.6587384 = fieldWeight in 51, product of: 1.0 = tf(freq=1.0),
with freq of: 1.0 = termFreq=1.0 5.8539815 = idf(docFreq=3, maxDocs=513)
0.625 = fieldNorm(doc=51) 3.050414 = (MATCH) weight(title_search_global:zaak
in 51) [DefaultSimilarity], result of: 3.050414 = score(doc=51,freq=1.0 =
termFreq=1.0 ), product of: 0.7454662 = queryWeight, product of: 6.5471287 =
idf(docFreq=1, maxDocs=513) 0.113861546 = queryNorm 4.091955 = fieldWeight
in 51, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0
6.5471287 = idf(docFreq=1, maxDocs=513) 0.625 = fieldNorm(doc=51)


1.9509662 = (MATCH) product of: 3.9019325 = (MATCH) sum of: 3.9019325 =
(MATCH) weight(title_search_global:dier in 50) [DefaultSimilarity], result
of: 3.9019325 = score(doc=50,freq=1.0 = termFreq=1.0 ), product of:
0.66654336 = queryWeight, product of: 5.8539815 = idf(docFreq=3,
maxDocs=513) 0.113861546 = queryNorm 5.8539815 = fieldWeight in 50, product
of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.8539815 =
idf(docFreq=3, maxDocs=513) 1.0 = fieldNorm(doc=50) 0.5 = coord(1/2)


0.9754831 = (MATCH) product of: 1.9509662 = (MATCH) sum of: 1.9509662 =
(MATCH) weight(title_search_global:dier in 132) [DefaultSimilarity], result
of: 1.9509662 = score(doc=132,freq=1.0 = termFreq=1.0 ), product of:
0.66654336 = queryWeight, product of: 5.8539815 = idf(docFreq=3,
maxDocs=513) 0.113861546 = queryNorm 2.9269907 = fieldWeight in 132, product
of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.8539815 =

Re: Solr search engine configuration

2018-03-11 Thread PeterKerk

Thanks! That provides me with some more insight, I altered the search query
to "dieren zaak" to see how queries consisting of more than 1 word are
handled.
I see that words are tokenized into groups of 3, I think because of my
NGramFilterFactory with minGramSize of 3.



(title_search_global:(dieren zaak) OR description_search_global:(dieren
zaak))


(title_search_global:(dieren zaak) OR description_search_global:(dieren
zaak))


(+(((title_search_global:die title_search_global:ier
title_search_global:ere title_search_global:ren title_search_global:dier
title_search_global:iere title_search_global:eren title_search_global:diere
title_search_global:ieren title_search_global:dieren)
(title_search_global:zaa title_search_global:aak title_search_global:zaak))
(((description_search_global:dier description_search_global:diere
description_search_global:dieren)/no_coord)
description_search_global:zaak)))/no_coord


+(((title_search_global:die title_search_global:ier 
title_search_global:ere
title_search_global:ren title_search_global:dier title_search_global:iere
title_search_global:eren title_search_global:diere title_search_global:ieren
title_search_global:dieren) (title_search_global:zaa title_search_global:aak
title_search_global:zaak)) ((description_search_global:dier
description_search_global:diere description_search_global:dieren)
description_search_global:zaak))

ExtendedDismaxQParser





(lang:"nl" OR lang:"all")


lang:nl lang:all




I tried the query with and without the =edismax parameter but I'm
getting the EXACT same results. Does that mean some configuration error?

I'm not sure how to progress from here. Can you see if your presumption that
I'm mixing two different parsers is correct? My schema.xml is here:
http://www.telefonievergelijken.nl/schema.xml


Related: do you know of the existence of any sample schema.xml config that
would be usable for a search engine? Seems like something so obvious to
float around out there. I feel that would go a long way.



Not sure if it matters but my requirements are:

Exact match "dieren zaak" boost result with 1000 
Exact match "dierenzaak" boost result with 900 
Exact match "dieren" or "zaak" boost result with 600 

Partial match "huisdierenzaak" or "huisdieren zaak" boost result with 500 
Stem match "dier" boost result with 100 
Stem partial match "huisdier" boost result with 70 
Other partial matches "die" boost result with 10 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Solr search engine configuration

2018-03-10 Thread PeterKerk

Since Google onsite search will be end of life April 1 2018, I'm trying to
setup my own onsite search engine that indexes my site's content and makes
it searchable.

My data config successfully loads data from my database (products,
companies, blogs) into the fields.

I then try to search in both the title and the description fields with
weights. Now for example when users search on "dieren" (this means "animals"
in Dutch):

=(title_search_global:(dieren) OR
description_search_global:(dieren))=title_search_global+title_exactm‌atch^1000+description_search_global+description_exactm‌atch^100

I get results with "dieren", "huisdieren", but I also get undesired results
with "manieren" and "versieren".

What I want is to find text using the following logic (all case
insensitive):


Exact match "dieren" boost result with 1000
Partial match "huisdieren" boost result with 500
Stem match "dier" boost result with 100
Stem partial match "huisdier" boost result with 70
Other partial matches "die" boost result with 10

My current schema.xml is here: http://www.telefonievergelijken.nl/schema.xml
I tried the solr admin tool for tokenization, but I can't figure out how to
get to the above logic.
I also Googled for an example Solr schema.xml configuration for building
your own search engines and I'm really surprised there's nothing out there.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Link entities in Solr Data config with multiple datasource

2017-10-27 Thread PeterKerk

I started here: https://wiki.apache.org/solr/DataImportHandler#multipleds

I have a WordPress database with articles. I keep statistics (like views) on
those articles (and a range of other objects not in WordPress) in a separate
MS SQL Server database.

Statistics in the MS SQL database for articles are stored with value 110 for
column `objecttype`. The `objectid` column in the sqldb matches the `id`
column from the WordPress database. 

What I want is when I get the details for an article, I want an additional
field `viewcounter` that is populated with the number of views from the SQL
DB in a certain timeframe (so a custom query on the MS SQL database).

How can I relate these two entities in such a way that it populates entity
`article` with data from entity `stats`?


I now have this data config:


" password="" />
" password="" />























--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: geolocation search ignores distance parameter

2015-11-22 Thread PeterKerk

@Erik: thanks, overlooked that...added fq= before geofilt and now it works :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/geolocation-search-ignores-distance-parameter-tp4241564p4241571.html
Sent from the Solr - User mailing list archive at Nabble.com.

geolocation search ignores distance parameter

2015-11-22 Thread PeterKerk

Why is the result below returned even though I'm filtering in a radius of 20
from geocoordinates defined in pt parameter in the querystring?
As you can see the result in _dist_ in this result is is far larger than 20.

http://localhost:8983/solr/locs/select/?indent=on=true{!geofilt}=51.98,5.9=geolocation=20=geodist()%20asc=*:*=0=12=id,_dist_:geodist(),lat,lng



4.20579929967
1803
51.5320753
127.50432946951436

   

schema.xml definitions

  






I tried adding this to the query string: =_dist_:10

but then I get the error: undefined field _dist_




--
View this message in context: 
http://lucene.472066.n3.nabble.com/geolocation-search-ignores-distance-parameter-tp4241564.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Use faceted search to drill down in hierarchical structure and omit node data outside current selection

2015-07-31 Thread PeterKerk

Hey Alessandro,

Can you help me? :)

Thank you!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-faceted-search-to-drill-down-in-hierarchical-structure-and-omit-node-data-outside-current-selectn-tp4219384p4220080.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Use faceted search to drill down in hierarchical structure and omit node data outside current selection

2015-07-29 Thread PeterKerk

Ok, I managed to get this as output via SQL for a single product:

ProductId  categorystring
2481445 cake  caketoppers  funny
2481445 caketoppers  funny

Before I start diving into the tokenization in Solr, this is what you meant
as the correct input of the data right? I should be able to support drilling
down in categories using your suggested solution?

Just want to make sure I'm on the right track here :)

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-faceted-search-to-drill-down-in-hierarchical-structure-and-omit-node-data-outside-current-selectn-tp4219384p4219773.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Use faceted search to drill down in hierarchical structure and omit node data outside current selection

2015-07-29 Thread PeterKerk

Hi Alessandro!

I'm having a hard time on how to use the PathHierarchyTokenizerFactory. I
was reading here:
https://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizerFactory.html

And ended up with this:


fieldType name=descendent_path class=solr.TextField
   analyzer type=index
 tokenizer class=solr.PathHierarchyTokenizerFactory 
delimiter= /
   /analyzer
   analyzer type=query
 tokenizer class=solr.KeywordTokenizerFactory /
   /analyzer
 /fieldType

I tried with these field definitions:
 
field name=categorystring_nl type=string indexed=true 
stored=true
multiValued=true/
field name=categorystring_tokenized type=descendent_path
indexed=true stored=true multiValued=true/

And these querystring parameters in the request:

1.  
facet.field=categorystring_nl -- this returns a facet with count based on
full categorystring, e.g. bruidstaarttaarttoppersgrappig, so I can't use
that for the count on the highest category level (in this case
bruidstaart):

lst name=categorystring_nl
int name=feestartikelenballonnen15/int
int name=bruidstaarttaarttoppersgrappig6/int
int name=taarttoppersgrappig6/int
int name=accessoirestiaras3/int
/lst



2. 
facet.field=categorystring_tokenized, this now returns:

lst name=categorystring_tokenized
int name=feestartikelen15/int
int name=feestartikelenballonnen15/int
int name=bruidstaart6/int
int name=bruidstaarttaarttoppers6/int
int name=bruidstaarttaarttoppersgrappig6/int
int name=taarttoppers6/int
int name=taarttoppersgrappig6/int
int name=accessoires3/int
int name=accessoirestiaras3/int
/lst


I'm now wondering, is this the data you expected me to end up with? Right
now I still don't see how I can easily extract the hierarchy from this data,
except by looping through the facets and count the number of  occurrences
in the name attribute to determine the actual level in the hierarchy.

Can you advice? 

Thanks again!








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-faceted-search-to-drill-down-in-hierarchical-structure-and-omit-node-data-outside-current-selectn-tp4219384p4219832.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Use faceted search to drill down in hierarchical structure and omit node data outside current selection

2015-07-29 Thread PeterKerk

Hi Charlie,

Your solution seems to remove faceting capabilities...so that's not what I'm
looking for :) Thanks though!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-faceted-search-to-drill-down-in-hierarchical-structure-and-omit-node-data-outside-current-selectn-tp4219384p4219833.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Use faceted search to drill down in hierarchical structure and omit node data outside current selection

2015-07-28 Thread PeterKerk

Oh and one more thing, I was Googling on this and found 
http://www.springyweb.com/2012/01/hierarchical-faceting-with-elastic.html, so 
apparently your solution is similar to this: hierarchical Faceting With Elastic 
Search?
So does your solution facilitate for items to be in multiple categories? e.g. a 
product may be in:

Man 
Man  top 
Man  top  shirt 
Man  top  shirt sleeveless shirt 

AND also fall under:

Clothing 
Clothing  shirt 
Clothing  shirt sleeveless shirt 

Thanks again! 

From: Alessandro Benedetti [via Lucene] 
Sent: Tuesday, July 28, 2015 10:26
To: PeterKerk 
Subject: Re: Use faceted search to drill down in hierarchical structure and 
omit node data outside current selection

The fact is that you are trying to model a hierarchical facet on documents 
that actually index the  content as a simple field. 

What I would suggest for example is to use a PathhierarcyTokenizer for your 
field with a proper separator. 
This will produce these tokens in the index : 

input : Man  top  shirt  sleeveless shirt 
Tokenized : 

Man 
Man  top 
Man  top  shirt 
Man  top  shirt sleeveless shirt 

At this point your counting will be exactly what you would like, you need 
only to parse it Search API side and model the hierarchical facets in 
nested elements. 

Cheers 



2015-07-28 2:02 GMT+01:00 PeterKerk [hidden email]: 


 I have the following structure for my products, where a product may fall 
 into 
 multiple categories. In my case, a caketopper, which would be under 
 cake/caketoppers as well as caketoppers (don't focus on the logic 
 behind 
 the category structure in this example). 
 
 Category structure: 
 
 cake 
 caketoppers 
 funny 
 
 caketoppers 
 funny 
 
 What I want is that when the user has chosen a category on level 0 (the 
 main 
 category selection), in this case 'caketoppers', I don't want to return the 
 attributes/values that same product has because it's also in a different 
 category. 
 I tried the following queries, but it keeps returning all data: 
 
 
 f.slug_nl_0.facet.pre‌fix=(caketoppers)fq=slug_nl_0:(caketoppers) 
 
 f.slug_nl_0.facet.pre‌fix=caketoppersfq=slug_nl_0:(caketoppers) 
 
 I keep getting this result (cleaned for better readability): 
 
 result name=response numFound=6 start=0 
 doc 
 arr name=slug_nl_0 
 strcaketoppers/str 
 strcake/str 
 /arr 
 /doc 
 /result 
 lst name=facet_counts 
 lst name=facet_fields 
 lst name=slug_nl_0 
 int name=cake6/int 
 int name=caketoppers6/int 
 /lst 
 /lst 
 /lst 
 
 But my desired result would be: 
 
 result name=response numFound=6 start=0 
 doc 
 arr name=slug_nl_0 
 strcaketoppers/str 
 /arr 
 /doc 
 /result 
 lst name=facet_counts 
 lst name=facet_fields 
 lst name=slug_nl_0 
 int name=caketoppers6/int 
 /lst 
 /lst 
 /lst 
 
 
 
 field definition of 'slug_nl_0' in schema.xml: 
 field name=slug_nl_0 type=text indexed=true stored=true 
 multiValued=true/ 
 
 
 I also tried with a more simple query but I'm getting the exact same 
 results: 
 
 facet.pre‌fix=caketoppersfq=slug_nl_0:caketoppers 
 
 I then was reading into grouping: 
 http://wiki.apache.org/solr/FieldCollapsing
 
 So I tried adding that in my queries, but I get errors: 
 
 
 `fq=slug_nl_0:taarttoppersgroup=truegroup.facet=truegroup.field=slug_nl_0`
  
 
 error: can not use FieldCache on multivalued field: slug_nl_0 
 
 `fq=slug_nl_0:taarttoppersgroup=truegroup.field=slug_nl_0` 
 
 error: can not use FieldCache on multivalued field: slug_nl_0 
 
 `fq=slug_nl_0:taarttoppersgroup.facet=truegroup.field=slug_nl_0` 
 
 error: Specify the group.field as parameter or local parameter 
 
 And then I noticed this at the bottom of the page: 
 
  Known Limitations Support for grouping on a multi-valued field has not 
  yet been implemented. 
 
 On that same Solr FieldCollapsing example page they refer to Best Buy as an 
 example. Now I wonder how that was implemented without support for 
 multivalued fields. 
 
 What can I do? 
 
 
 
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Use-faceted-search-to-drill-down-in-hierarchical-structure-and-omit-node-data-outside-current-selectn-tp4219384.html
 Sent from the Solr - User mailing list archive at Nabble.com. 
 



-- 
-- 

Benedetti Alessandro 
Visiting card - http://about.me/alessandro_benedetti
Blog - http

Use faceted search to drill down in hierarchical structure and omit node data outside current selection

2015-07-27 Thread PeterKerk

I have the following structure for my products, where a product may fall into
multiple categories. In my case, a caketopper, which would be under
cake/caketoppers as well as caketoppers (don't focus on the logic behind
the category structure in this example).

Category structure:

cake
caketoppers
funny

caketoppers
funny

What I want is that when the user has chosen a category on level 0 (the main
category selection), in this case 'caketoppers', I don't want to return the
attributes/values that same product has because it's also in a different
category.
I tried the following queries, but it keeps returning all data:

f.slug_nl_0.facet.pre‌fix=(caketoppers)fq=slug_nl_0:(caketoppers)
f.slug_nl_0.facet.pre‌fix=caketoppersfq=slug_nl_0:(caketoppers)

I keep getting this result (cleaned for better readability):

result name=response numFound=6 start=0
doc
arr name=slug_nl_0
strcaketoppers/str
strcake/str
/arr
/doc
/result
lst name=facet_counts
lst name=facet_fields
lst name=slug_nl_0
int name=cake6/int
int name=caketoppers6/int
/lst
/lst  
/lst  

But my desired result would be:

result name=response numFound=6 start=0
doc
arr name=slug_nl_0
strcaketoppers/str
/arr
/doc
/result
lst name=facet_counts
lst name=facet_fields
lst name=slug_nl_0
int name=caketoppers6/int
/lst
/lst  
/lst  



field definition of 'slug_nl_0' in schema.xml:  
field name=slug_nl_0 type=text indexed=true stored=true
multiValued=true/


I also tried with a more simple query but I'm getting the exact same
results:  

facet.pre‌fix=caketoppersfq=slug_nl_0:caketoppers

I then was reading into grouping:
http://wiki.apache.org/solr/FieldCollapsing

So I tried adding that in my queries, but I get errors:

`fq=slug_nl_0:taarttoppersgroup=truegroup.facet=truegroup.field=slug_nl_0`

error: can not use FieldCache on multivalued field: slug_nl_0

`fq=slug_nl_0:taarttoppersgroup=truegroup.field=slug_nl_0`

error: can not use FieldCache on multivalued field: slug_nl_0

`fq=slug_nl_0:taarttoppersgroup.facet=truegroup.field=slug_nl_0`

error: Specify the group.field as parameter or local parameter

And then I noticed this at the bottom of the page:

 Known Limitations Support for grouping on a multi-valued field has not
 yet been implemented.

On that same Solr FieldCollapsing example page they refer to Best Buy as an
example. Now I wonder how that was implemented without support for
multivalued fields.

What can I do?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-faceted-search-to-drill-down-in-hierarchical-structure-and-omit-node-data-outside-current-selectn-tp4219384.html
Sent from the Solr - User mailing list archive at Nabble.com.

delta import on changes in entity within a document

2015-03-26 Thread PeterKerk

I have the following data-config:

document name=locations
entity pk=id name=location query=select * from locations WHERE
isapproved='true'
deltaImportQuery=select * from locations WHERE updatedate lt; 
getdate()
AND isapproved='true' AND id='${dataimporter.delta.id}'
deltaQuery=select id from locations where isapproved='true' AND
updatedate gt; '${dataimporter.last_index_time}'



entity name=offerdetails query=SELECT title as
offer_title,ISNULL(img,'') as offer_thumb,id as offer_id
,startdate as offer_startdate
,enddate as offer_enddate
,description as offer_description
,updatedate as offer_updatedate
FROM offers WHERE objectid=${location.id}
/entity   
/document


Now, when the object in the [locations] table is updated, my delta import
(/dataimport?command=delta-import) query works perfectly.
But when an offer is updated in the [offers] table, this is not seen by the
deltaimport command. Is there way to delta-import only the updated offers
for the respective location if an offer is updated? And then without:
a. having to fully import ALL locations 
or 
b. having to update this single location and then do a regular deltaimport?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/delta-import-on-changes-in-entity-within-a-document-tp4195615.html
Sent from the Solr - User mailing list archive at Nabble.com.

Filter Solr multivalued fields to be able to add pagination

2015-01-20 Thread PeterKerk

I have the Solr XML response below using this query:
http://localhost:8983/solr/tt/select/?indent=offfacet=falsewt=xmlfl=title,overallscore,service,reviewdateq=*:*fq=id:315start=0rows=4sort=reviewdate%20desc

I want to add paging on the multivalued fields, but the above query throws
the error `can not sort on multivalued field: reviewdate`

How can I add paging (or put differenly select only a subset of the response
based on a filter on a multivalued field)? In my case with a pagesize of 4
and total results of 18 that would result in 5 pages.



?xml version=1.0 encoding=UTF-8?
response
   lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
  lst name=params
 str name=facettrue/str
 str 
name=flreviewtitle,overallscore,service,reviewdate,rating/str
 str name=facet.mincount1/str
 str name=indenton/str
 str name=q*:*/str
 str name=fqid:315/str
  /lst
   /lst
   result name=response numFound=1 start=0
  doc
 float name=rating8.78/float
 arr name=service
int8/int
int10/int
int10/int
int10/int
int5/int
int8/int
int9/int
int10/int
int10/int
int10/int
int10/int
int9/int
int9/int
int9/int
int9/int
int6/int
int1/int
int10/int
 /arr
 arr name=overallscore
int8/int
int10/int
int10/int
int10/int
int8/int
int8/int
int9/int
int10/int
int9/int
int10/int
int10/int
int9/int
int10/int
int9/int
int9/int
int8/int
int1/int
int10/int
 /arr
 arr name=reviewdate
date2014-11-26T17:18:50.367Z/date
date2014-10-10T16:54:07.397Z/date
date2014-08-18T14:21:17.807Z/date
date2014-08-17T00:20:41.877Z/date
date2014-08-14T15:30:44.963Z/date
date2014-08-14T15:23:36.29Z/date
date2014-08-13T16:25:38.327Z/date
date2014-08-13T13:54:47.847Z/date
date2014-08-13T13:20:20.753Z/date
date2014-06-16T23:29:37.093Z/date
date2012-11-23T21:54:07.897Z/date
date2012-11-21T17:40:01.11Z/date
date2012-11-17T01:58:53.15Z/date
date2012-11-14T02:17:30.677Z/date
date2012-11-13T23:22:14.613Z/date
date2012-11-13T19:09:25.563Z/date
date2012-08-01T18:09:33.243Z/date
date2012-07-09T20:37:39.837Z/date
 /arr

  /doc
   /result
   lst name=facet_counts
  lst name=facet_queries /
  lst name=facet_fields /
  lst name=facet_dates /
  lst name=facet_ranges /
   /lst
/response



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filter-Solr-multivalued-fields-to-be-able-to-add-pagination-tp4180653.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Flexible search field analyser/tokenizer configuration

2014-10-04 Thread PeterKerk

In Engish, I think this part:
(title_search_global:(Ballonnenboog) OR
title_search_global:Ballonnenboog^100)
is looking for a match on Ballonenboog in the title and give a boost if it
occurs exactly as this.

The second part does the same but then for the description_search field, and
with an OR operator (so I would think it would not eliminate all matches:

(description_search:(Ballonnenboog) OR
description_search:Ballonnenboog^100)

And finally this part:

title_search_global^10.0+description_search^0.3

Gives a higher boost to the occurrence of the query in title_search_global
field than description_search field.

But something must be wrong with my analysis :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624p4162660.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Flexible search field analyser/tokenizer configuration

2014-10-04 Thread PeterKerk

Thanks, removing the fq parameters helped :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624p4162667.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Flexible search field analyser/tokenizer configuration

2014-10-03 Thread PeterKerk

Ok, that field now totally works, thanks again!

I've removed the wildcard to benefit from ranking and boosting and am now
trying to combine this field with another, but I have some difficulties
figuring out the right query.

I want to search on the occurence of the keyword in the title field
(title_search_global) of a document OR in the description field
(description_search)
and if it occurs in the title field give that the largest boost, over a
minor boost in the description_search field.

Here's what I have now on query Ballonnenboog

http://localhost:8983/solr/tt-shop/select?q=(title_search_global%3A(Ballonnenboog)+OR+title_search_global%3A%22Ballonnenboog%22%5E100)+OR+description_search%3A(Ballonnenboog)fq=title_search_global%5E10.0%2Bdescription_search%5E0.3fl=id%2Ctitlewt=xmlindent=true

But it returns 0 results, even though there are results that have
Ballonnenboog in the title_search_global field.

What am I missing?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624p4162638.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Flexible search field analyser/tokenizer configuration

Ok, I missed the Query tab where I can do the actual site search :)

I've also used your links, but even with those I fail to grasp why the
following is happening:

This is my query:
http://localhost:8983/solr/bm/select?q=*%3A*fq=The+Royal+Gardenrows=50fl=id%2Ctitlewt=xmlindent=true


And below the result.
Notice how results that have the in their title are also returned...words
like the, a, in in general are words I wish to ignore IF the rest of
the title does not match.
And now with my query The Royal Garden, I have a result that is an exact
match on all 3 words, but that result is listed all the way at the bottom.
How can I prevent:

a) make sure that items that only share the words I want to ignore the,
a etc. are not being returned
b) make sure that the exact match is at the top of the results and only
after that the partial matches, so that the 1st results would be The Royal
Garden and the 2nd result would be Royal

Thanks!

?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime1/int
  lst name=params
str name=flid,title/str
str name=indenttrue/str
str name=q*:*/str
str name=_1412188632532/str
str name=wtxml/str
str name=fqThe Royal Garden/str
str name=rows60/str
  /lst
/lst
result name=response numFound=9 start=0
  doc
str name=id1579/str
str name=titleRoyal/str/doc
  doc
str name=id1603/str
str name=titleThe Blue Lagoon/str/doc
  doc
str name=id1629/str
str name=titleThe Nightingale DJ Light Sound Vision/str/doc
  doc
str name=id1648/str
str name=titleThe Swingmasters/str/doc
  doc
str name=id2431/str
str name=titleThe Cover Band/str/doc
  doc
str name=id2457/str
str name=titleThe Teahouse Company/str/doc
  doc
str name=id2493/str
str name=titleThe Task - Ultimate Party Band/str/doc
  doc
str name=id2499/str
str name=titleThe Royal Garden/str/doc
  doc
str name=id2500/str
str name=titleThe Wall/str/doc
/result
/response



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624p4162174.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Flexible search field analyser/tokenizer configuration

Hi Erick,

Thanks for clarifying some of this :)

That triggers a few more questions:

1. I have no df setting in my solrconfig.xml file at all, nor do I see a
requestHandler name=quot;/selectquot; anywhere. How would this typically
look? 
2. My site is in 2 languages, Dutch and English. So I now added the Dutch
stopwords like below to my field definition. However, I also want to exclude
English stopwords...does that mean I need to define this field definition
for each language or can I add stopwords for multiple languages in the same
field definition?

lt;fieldType name=quot;searchtextquot; 
class=quot;solr.TextFieldquot;
positionIncrementGap=quot;100quot;
  analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/   
 filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
 filter class=solr.LowerCaseFilterFactory/ 
 filter class=solr.EdgeNGramFilterFactory minGramSize=2
maxGramSize=20 side=front / 
  /analyzer
  analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/   
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
 filter class=solr.LowerCaseFilterFactory/ 
 filter class=solr.EdgeNGramFilterFactory minGramSize=2
maxGramSize=20 side=front / 
  /analyzer
/fieldType

3. fq:the AND Royal AND Garden works indeed, but how would I go about to
make sure that in that query
a. the is ignored
b. The Royal Garden is returned as the 1st result since it's an exact
match and Royal as the 2nd results since it's a partial match (on
non-stopwords)? I guess that would be via the ranking you mention, but where
to configure that for my usecase? I have seen weights on results by using
the ^ operator, e.g. qf=title_search^20.0+province^15+city_search^10.0 but
I doubt that is the way to go here.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624p4162200.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Flexible search field analyser/tokenizer configuration

You were right, I had an old configuration :)
But using your new suggestions had made that it works! Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624p4162249.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Flexible search field analyser/tokenizer configuration

Sorry, one final thing.

In my current application I search like this: 
q=title:searchquery*defType=lucene

I was checking here: http://wiki.apache.org/solr/SolrQuerySyntax

But with my new query, could I just remove the defType=lucene parameter and
the wildcard right? Or am I overlooking something then?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624p4162250.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Flexible search field analyser/tokenizer configuration

2014-09-29 Thread PeterKerk

Hi Ahmet,

Am I correct that his this is only avalable in Solr4.8?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.TruncateTokenFilterFactory


Also, I need to add your lines to both index and query analyzers? making
my definition like so:

fieldType name=searchtext class=solr.TextField
positionIncrementGap=100
  analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/ 
 filter class=solr.LowerCaseFilterFactory/ 
 filter class=solr.TruncateTokenFilterFactory 
prefixLength=3/ 
  /analyzer
  analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/ 
 filter class=solr.LowerCaseFilterFactory/ 
 filter class=solr.TruncateTokenFilterFactory 
prefixLength=3/ 
  /analyzer
/fieldType

Your solution seems much easier to setup than what is proposed by
Alexandre...for my understanding, what is the difference?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624p4161778.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Flexible search field analyser/tokenizer configuration

2014-09-29 Thread PeterKerk

Ah, thanks! Sounds indeed like EdgeNGramFilterFactory is what I need.
I actually upgraded to Solr 4.10.1 (from 4.3.1) while I was at it.

I now have this:

fieldType name=searchtext class=solr.TextField
positionIncrementGap=100
  analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/ 
 filter class=solr.LowerCaseFilterFactory/ 
 filter class=solr.EdgeNGramFilterFactory minGramSize=2
maxGramSize=20 side=front / 
  /analyzer
  analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/ 
 filter class=solr.LowerCaseFilterFactory/ 
 filter class=solr.EdgeNGramFilterFactory minGramSize=2
maxGramSize=20 side=front / 
  /analyzer
/fieldType

I then check the output like so:

http://localhost:8983/solr/#/bm/analysis?analysis.fieldvalue=wallanalysis.query=the%20royal%20gardenanalysis.fieldtype=searchtextverbose_output=0


For Index and Query on The Royal Garden I get:

WTThe   Royal   Garden
LCF   the   royal   garden
ENGTF   th   the   ro   roy   roya   royal   ga   gar   gard   garde  
garden


Now I'm not experienced with this interface, but can I test my actual search
queries via this interface? So which and how many documents are returned
when a site visitor would actually search on The Royal Garden?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624p4161849.html
Sent from the Solr - User mailing list archive at Nabble.com.

JSP support not configured in cygwin with Apache Solr

2014-09-28 Thread PeterKerk

I'm starting Cygwin with Apache solr-4.3.1 like so:

@echo off

C:
chdir C:\cygwin\bin

rem bash --login -i

bash -c cd /cygdrive/c/solr-4.3.1/example/;java
-Dsolr.solr.home=./example-DIH/solr/ -jar -Xms200m -Xmx1200m start.jar
-OPTIONS=jsp 

But when I go to http://localhost:8983/solr/db/admin/analysis.jsp

I get a 500 error `Problem accessing /solr/db/admin/analysis.jsp. Reason:
JSP support not configured`

Why?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/JSP-support-not-configured-in-cygwin-with-Apache-Solr-tp4161613.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: JSP support not configured in cygwin with Apache Solr

2014-09-28 Thread PeterKerk

Was an old bookmark..I did not notice the extra pages under core
selectionfound it now , thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/JSP-support-not-configured-in-cygwin-with-Apache-Solr-tp4161613p4161619.html
Sent from the Solr - User mailing list archive at Nabble.com.

Flexible search field analyser/tokenizer configuration

2014-09-28 Thread PeterKerk

I have a site which lists companies.

I'm looking to improve my search, but I want to know which available
analysers and tokenizers I should use for which scenario, and if it's at all
possible.

I want users to be able to search on the company title on for example a
company called The Royal Garden 

The logic for this search should be as follows, The Royal Garden, should
be found on queries:
the royal garden
royal garden
the roy
The royal
RoYAl
garden

So case insensitive, matching on parts of words.

However, a query the royal should not return companies like:
the wall
the room
the restaurant

So words like the, but also a should be ignored if these are the only
match in the searchquery.

I now have this:

fieldType name=searchtext class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


field name=title_search type=searchtext indexed=true 
stored=true/

I'm testing on http://localhost:8983/solr/#/bm/analysis but I'm stuck.

Also, I would think my scenario is pretty common and lots of users have
already configured their Solr search to be flexible and powerful...any good
search configurations would be welcome!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searchquery on field that contains space

2014-01-10 Thread PeterKerk

@iorixxx: thanks, you 2nd solution worked.

The first one didn't (does not matter now), I got this:

field name=title type=prefix_full indexed=true stored=true/
field name=title_search type=prefix_full indexed=true stored=true/

With the first solution all queries work as expected, however with this:

q=title_search:new%20yk*

still new york is returned.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166p4110658.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searchquery on field that contains space

@Ahmet:

Thanks, but I also need to be able to search via wildcard and just found
that a - might be resulting in unwanted results. E.g. when using this
query:

http://localhost:8983/solr/tt-cities/select/?indent=offfacet=falsefl=id,title,provincetitle_nlq=title_search:nij*defType=lucenestart=0rows=15

I also get a result for Halle-Nijman, so it seems the wildcard is not
working, as Halle-Nijman does not start with nij (or Nij)
I also tried:
q=title_search:(nij*)
q=title_search:(nij)*

How can I fix this?


@Erick:

When I'm on the analysis page I get the error:

This Functionality requires the /analysis/field Handler to be registered
and active!

So I added this line to my solr config (based on this post:
http://stackoverflow.com/questions/12627734/configure-field-analysis-handler-solr-4)

requestHandler name=/analysis/field
class=solr.FieldAnalysisRequestHandler /

But still the same error occurs.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166p4110485.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searchquery on field that contains space

Basically a user starts typing the first letters of a city and I want to
return citynames that start with those letters, case-insensitive and not
splitting the cityname on separate words (whether the separator is a
whitespace or a -).
But although the search of a user is case-insensitive, I want to return the
values including casing, search on new york would return New York, where
the latter is how it's stored in my MS-SQL DB.

I've been testing my code via the admin/analysis page.

I believe I don't want the WhitespaceTokenizerFactory on my field definition
since that splits the city names I want the following behavior:

query on:

new* returns New york or newbee, but does not return values like
greater new hampshire
york* does NOT return new york

nij* returns Nijmegen, but not Halle-Nijman

Here's what I have come up so far:

field name=title type=text_lower_exact indexed=true 
stored=true/
field name=title_search type=text_lower_exact indexed=true
stored=true/


fieldType name=text_lower_exact class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.LowerCaseFilterFactory/

filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


But when I leave out the WhitespaceTokenizerFactory I get:  Plugin init
failure for [schema.xml] fieldType text_lower_exact: analyzer without
class or tokenizer,trace=org.apache.solr.common.SolrException: SolrCore
'tt-cities' is not available due to init failure



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166p4110495.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searchquery on field that contains space

Hi Ahmet,

Thanks. Also for that link, although it's too advanced for my usecase.

I see that by using KeywordTokenizerFactory it almost works now, but when I
search on:

new y, no results are found, 

but when I search on new, I do get New York.

So the space in the searchquery is still causing problems, what could that
be?

Thanks again!

ps. are you guys (like you, Erick, Maurice etc.) also active on
StackOverflow? At least you'll get the credit for good support :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166p4110515.html
Sent from the Solr - User mailing list archive at Nabble.com.

Return only distinct combinations of 2 field values

I'm searching on cities and returning city and province, some cities exist in
different provinces, which is ok.
However, I have some duplicates, meaning 2 cities occur in the same
province. In that case I only want to return 1 result.
I therefore need to have a distinct and unique city+province combination.

How can I make sure that only unique city+province combinations are returned
by my query?

http://localhost:8983/solr/tt-cities/select/?indent=offfacet=falsefl=id,title,provincetitle_nlq=*:*defType=lucenestart=0rows=15

The respective fields are title and provincetitle_nl. Below my schema.xml

fieldType name=text_lower_exact class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

field name=title type=text_lower_exact indexed=true stored=true/
field name=provincetitle_nl type=string indexed=true stored=true/




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Return-only-distinct-combinations-of-2-field-values-tp4110521.html
Sent from the Solr - User mailing list archive at Nabble.com.

Searchquery on field that contains space

2014-01-08 Thread PeterKerk

My query on finding a cityname does not show the closest matching value, but
instead gives priority to the first word in the searchquery.

I believe it has something to do with the whitespace tokenenization, but I
don't know which fields to change to what type.


Here's what happens when I search on new york

http://localhost:8983/solr/tt-cities/select/?indent=offfacet=falsefl=id,titleq=title_search:*new%20york*defType=lucenestart=0rows=10

result name=response numFound=810 start=0
doc
str name=titleNew Golden Beach/str
/doc
doc
str name=titleNew Auckland/str
/doc
doc
str name=titleNew Waverly/str
/doc
doc
str name=titleNew Market Village Mobile Home Park/str
/doc
doc
str name=titleNew Centerville/str
/doc
doc
str name=titleNew Meadows/str
/doc
doc
str name=titleNew Plymouth/str
/doc
doc
str name=titleNew Hope Mobile Home Park/str
/doc
doc
str name=titleNew Light/str
/doc
doc
str name=titleNew Vienna/str
/doc
/result


My schema.xml

fieldType name=text_ws class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType


field name=title type=text_ws indexed=true stored=true/
field name=title_search type=string indexed=true stored=true/

copyField source=title dest=title_search/

I also tried:

field name=title_search type=text indexed=true stored=true/   

And:
field name=title type=string indexed=true stored=true/
field name=title_search type=string indexed=true stored=true/



What to do?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Empty facets on Solr with MySQL

Hi Andrea, 

I would say the JDBC driver must be working because when I leave out the
required=true from the cat_name field, 4 documents are imported. Since my
entire DB currently holds only 4 records, there's no need for a LIMIT clause
I guess?


Andrea Gazzarini-4 wrote
 In the solr console set to DEBUG / FINEST the level of DIH classes

How do I do that?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Empty-facets-on-Solr-with-MySQL-tp4109170p4109290.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Empty facets on Solr with MySQL

Hi Andrea,

You were right, I do see errors when setting the required=true
attribute...what can it be?


Logging console homepage:

13:31:54
WARN
SolrWriter
Error creating document : SolrInputDocument[comment_status=open,
post_content=algemeen kpn artikeltje ook over vodafone,
guid=http://www.telefonievergelijken.nl/wordpress/?p=20, post_excerpt=,
id=20, post_author=1, post_modified=2014-01-02 14:24:28.0,
post_name=kpn-en-vodafone, post_title=KPN en Vodafone,
imgurl=http://www.talkman.nl/wordpress/wp-content/uploads/2013/11/taj.png,
post_date=2013-12-13 14:12:17.0]

Console errors:

36588 [Thread-15] WARN  org.apache.solr.handler.dataimport.SolrWriter  û
Error c
reating document : SolrInputDocument[comment_status=open,
post_content=algemeen
kpn artikeltje ook over vodafone,
guid=http://www.telefonievergelijken.nl/wordpr
ess/?p=20, post_excerpt=, id=20, post_author=1, post_modified=2014-01-02
14:24:2
8.0, post_name=kpn-en-vodafone, post_title=KPN en Vodafone,
imgurl=http://www.ta
lkman.nl/wordpress/wp-content/uploads/2013/11/taj.png, post_date=2013-12-13
14:1
2:17.0]
org.apache.solr.common.SolrException: [doc=20] missing required field:
cat_name
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.jav
a:328)
at
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCo
mmand.java:73)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandle
r2.java:208)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpd
ateProcessorFactory.java:69)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(Up
dateRequestProcessor.java:51)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAd
d(DistributedUpdateProcessor.java:545)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAd
d(DistributedUpdateProcessor.java:398)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpd
ateProcessorFactory.java:100)
at
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:
70)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImp
ortHandler.java:235)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:500)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:404)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
ava:319)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:227)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
rter.java:422)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
ava:487)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
va:468)
36596 [Thread-15] INFO  org.apache.solr.handler.dataimport.DocBuilder  û
Import
completed successfully
36596 [Thread-15] INFO  org.apache.solr.update.UpdateHandler  û start
commit{,op
timize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit
=false,prepareCommit=false}
36601 [Thread-15] INFO  org.apache.solr.core.SolrCore  û
SolrDeletionPolicy.onCo
mmit: commits:num=2
   
commit{dir=NRTCachingDirectory(org.apache.lucene.store.SimpleFSDirectory
@C:\Dropbox\Databases\solr-4.3.1\example\example-DIH\solr\tv-wordpress\data\inde
x lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@1836cd1;
maxCach
eMB=48.0
maxMergeSizeMB=4.0),segFN=segments_4o,generation=168,filenames=[segment
s_4o]
   
commit{dir=NRTCachingDirectory(org.apache.lucene.store.SimpleFSDirectory
@C:\Dropbox\Databases\solr-4.3.1\example\example-DIH\solr\tv-wordpress\data\inde
x lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@1836cd1;
maxCach
eMB=48.0
maxMergeSizeMB=4.0),segFN=segments_4p,generation=169,filenames=[segment
s_4p]
36602 [Thread-15] INFO  org.apache.solr.core.SolrCore  û newest commit =
169[seg
ments_4p]
36603 [Thread-15] INFO  org.apache.solr.search.SolrIndexSearcher  û Opening
Sear
cher@54dfcf main
36604 [Thread-15] INFO  org.apache.solr.update.UpdateHandler  û
end_commit_flush

36604 [searcherExecutor-79-thread-1] INFO  org.apache.solr.core.SolrCore  û
Quer
ySenderListener sending requests to Searcher@54dfcf
main{StandardDirectoryReader
(segments_4p:1330871938621:nrt)}
36608 [searcherExecutor-79-thread-1] INFO  org.apache.solr.core.SolrCore  û
[tv-
wordpress] webapp=null path=null
params={start=0event=newSearcherq=solrdistri
b=falserows=10} hits=0 status=0 QTime=4
36609 [searcherExecutor-79-thread-1] INFO  org.apache.solr.core.SolrCore  û
[tv-
wordpress] webapp=null path=null
params={start=0event=newSearcherq=rocksdistr
ib=falserows=10} hits=0 status=0 QTime=0
36610 [searcherExecutor-79-thread-1] INFO  org.apache.solr.core.SolrCore  û
[tv-
wordpress] webapp=null path=null
params={event=newSearcherq=static+newSearcher+
warming+query+from+solrconfig.xmldistrib=false} hits=0 status=0

Re: Empty facets on Solr with MySQL

Hi Andrea,

Here you go:

**data-config.xml** 
dataConfig
dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/wordp user=*** password=*** /
document name=articles
entity pk=ID name=article query=SELECT p.*, (
SELECT guid FROM wp_posts WHERE id = m.meta_value ) AS imgurl 
FROM wp_posts p 
LEFT JOIN  wp_postmeta m ON(p.id = m.post_id AND m.meta_key = 
'_thumbnail_id' ) 
WHERE p.post_type =  'post' 
AND p.post_status =  'publish';

entity name=post_categories query=select
wt.name as cat_name,wt.slug,wtr.term_taxonomy_id,wtt.term_id,wtt.taxonomy
from 
wp_term_relationships wtr 
INNER JOIN wp_term_taxonomy wtt ON
wtt.term_taxonomy_id=wtr.term_taxonomy_id AND wtt.taxonomy='category' 
INNER JOIN wp_terms wt ON wt.term_id=wtt.term_taxonomy_id 
where wtr.object_id='${article.id}';
/entity
/entity
/document
/dataConfig


**schema.xml** 

field name=cat_name type=text indexed=true stored=true
multiValued=true  /
field name=cat_name_raw type=string indexed=true
stored=true multiValued=true /
copyField source=cat_name dest=cat_name_raw/ 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Empty-facets-on-Solr-with-MySQL-tp4109170p4109353.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Empty facets on Solr with MySQL

But when I execute the query directly on MySQL I do get a cat_name column in
there:

select wt.name as
cat_name,wt.slug,wtr.term_taxonomy_id,wtt.term_id,wtt.taxonomy from 
wp_term_relationships wtr
INNER JOIN wp_term_taxonomy wtt ON wtt.term_taxonomy_id=wtr.term_taxonomy_id
AND wtt.taxonomy='category'
INNER JOIN wp_terms wt ON wt.term_id=wtt.term_taxonomy_id
where wtr.object_id=18

I see no reason why my configuration in my data-config.xml would not execute
successfully:

entity name=post_categories query=select wt.name as
cat_name,wt.slug,wtr.term_taxonomy_id,wtt.term_id,wtt.taxonomy from 
wp_term_relationships wtr
INNER JOIN wp_term_taxonomy wtt ON wtt.term_taxonomy_id=wtr.term_taxonomy_id
AND wtt.taxonomy='category'
INNER JOIN wp_terms wt ON wt.term_id=wtt.term_taxonomy_id
where wtr.object_id='${article.id}';


/entity   


I have no transformers on my resultset (I checked my querystring, schema.xml
and data-config.xml, since I'm not even sure where it would have to be
defined).



Andrea Gazzarini-4 wrote
 You can debug the resultset in a main class by doing rs.getString 
 (cat_name) 

What do you mean by 'in a main class'? Where can I define that? (ps. I'm
working with ASP.NET if that matters)

Thanks again! :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Empty-facets-on-Solr-with-MySQL-tp4109170p4109388.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Empty facets on Solr with MySQL

Hi Andrea,

I think you helped me to get closer, but not quite there yet.

When I replace wtr.object_id='${article.id}'; with wtr.object_id=18 
the cat_name field holds a value, which I checked via the schema browser of
Solr dashboard!

I then checked my main query SELECT p.*, ( SELECT guid FROM wp_posts WHERE
id = m.meta_value ) AS imgurl 
FROM wp_posts p
LEFT JOIN  wp_postmeta m ON(p.id = m.post_id AND m.meta_key = 
'_thumbnail_id' )
WHERE p.post_type =  'post'
AND p.post_status =  'publish';

which returns 4 results. For each of these results I checked whether the
direct query on the database returns a cat_name and it does. So, no null
values there.


When I remove the quotes around the ID like so

entity name=post_categories query=select wt.name as
cat_name,wt.slug,wtr.term_taxonomy_id,wtt.term_id,wtt.taxonomy from 
wp_term_relationships wtr
INNER JOIN wp_term_taxonomy wtt ON wtt.term_taxonomy_id=wtr.term_taxonomy_id
AND wtt.taxonomy='category'
INNER JOIN wp_terms wt ON wt.term_id=wtt.term_taxonomy_id
where wtr.object_id=${article.id};

I get the errors:

Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable
 to execute query: select wt.name as
cat_name,wt.slug,wtr.term_taxonomy_id,wtt.t
erm_id,wtt.taxonomy from  wp_term_relationships wtr INNER JOIN
wp_term_taxonomy
wtt ON wtt.term_taxonomy_id=wtr.term_taxonomy_id AND wtt.taxonomy='category'
INN
ER JOIN wp_terms wt ON wt.term_id=wtt.term_taxonomy_id where wtr.object_id=;
Pro
cessing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
Throw(DataImportHandlerException.java:71)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.
init(JdbcDataSource.java:253)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSou
rce.java:210)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSou
rce.java:38)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEn
tityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEnti
tyProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:243)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:465)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:491)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:404)
... 5 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You
have a
n error in your SQL syntax; check the manual that corresponds to your MySQL
serv
er version for the right syntax to use near '' at line 1
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
Source)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
Sou
rce)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1054)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4237)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4169)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2617)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2778)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2819)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2768)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:894)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:732)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.
init(JdbcDataSource.java:246)
... 13 more



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Empty-facets-on-Solr-with-MySQL-tp4109170p4109398.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Empty facets on Solr with MySQL

No need, you solved it!
It was the id name, it had to be uppercase.

btw the ; is still there in the query, but everything still works.

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Empty-facets-on-Solr-with-MySQL-tp4109170p4109425.html
Sent from the Solr - User mailing list archive at Nabble.com.

Empty facets on Solr with MySQL

I've set up Solr with MySQL.
My data import is succesful:
http://localhost:8983/solr/wordpress/dataimport?command=full-import

However, when trying to get the cat_name facets all facets are empty: 
http://localhost:8983/solr/wordpress/select/?indent=onfacet=truesort=post_modified%20descq=*:*start=0rows=10fl=id,post_titlefacet.field=cat_namefacet.mincount=1

**data-config.xml**
dataConfig
dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/wordp user=*** password=*** /
document name=articles
entity pk=ID name=article query=SELECT p.*, ( 
SELECT guid FROM
wp_posts WHERE id = m.meta_value ) AS imgurl 
FROM wp_posts p
LEFT JOIN  wp_postmeta m ON(p.id = m.post_id AND m.meta_key = 
'_thumbnail_id' )
WHERE p.post_type =  'post'
AND p.post_status =  'publish';

entity name=post_categories query=select 
wt.name as
cat_name,wt.slug,wtr.term_taxonomy_id,wtt.term_id,wtt.taxonomy from 
wp_term_relationships wtr
INNER JOIN wp_term_taxonomy wtt ON
wtt.term_taxonomy_id=wtr.term_taxonomy_id AND wtt.taxonomy='category'
INNER JOIN wp_terms wt ON wt.term_id=wtt.term_taxonomy_id
where wtr.object_id='${article.id}';
/entity   
/entity
/document
/dataConfig


**schema.xml**

field name=cat_name type=text indexed=true stored=true
multiValued=true  /  
field name=cat_name_raw type=string indexed=true stored=true
multiValued=true /
copyField source=cat_name dest=cat_name_raw/

What am I missing?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Empty-facets-on-Solr-with-MySQL-tp4109170.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Empty facets on Solr with MySQL

Hi Ahmet,

I tried this URL:

http://localhost:8983/solr/wordpress/select/?indent=onfacet=truesort=post_modified%20descq=*:*start=0rows=10fl=id,post_title,cat_name*facet.field=cat_name_rawfacet.mincount=1

and this URL:

http://localhost:8983/solr/wordpress/select/?indent=onfacet=truesort=post_modified%20descq=*:*start=0rows=10fl=id,post_titlefacet.field=cat_name_rawfacet.mincount=1

But still I see empty facets. What more can I test?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Empty-facets-on-Solr-with-MySQL-tp4109170p4109176.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Empty facets on Solr with MySQL

I get Sorry, no Term Info available :(



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Empty-facets-on-Solr-with-MySQL-tp4109170p4109186.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Empty facets on Solr with MySQL

Hi Andrea,

I changed it to: field name=cat_name required=true type=text
indexed=true stored=true multiValued=true  / 

When I run full-import 0 documents are indexed, but no errors in the
console.
When I run my query via MySQL Workbench the statement executes correctly.

How else can I debug the index process?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Empty-facets-on-Solr-with-MySQL-tp4109170p4109199.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query

2013-12-30 Thread PeterKerk

I ran the query in debug mode:
http://localhost:8983/solr/tv-wordpress/dataimport?command=full-importdebug=true

Here's the output, what can you tell from this?

22432 [qtp33142123-13] INFO 
org.apache.solr.handler.dataimport.JdbcDataSource
û Creating a connection for entity article with URL:
jdbc:mysql@localhost:3306/w
ptalkman
22435 [qtp33142123-13] INFO 
org.apache.solr.handler.dataimport.JdbcDataSource
û Time taken for getConnection(): 3
22436 [qtp33142123-13] ERROR org.apache.solr.handler.dataimport.DocBuilder 
û Ex
ception while processing: article document :
SolrInputDocument[]:org.apache.solr
.handler.dataimport.DataImportHandlerException: Unable to execute query:
SELECT
* FROM wptalkman.wp_posts WHERE post_status='publish' Processing Document #
1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
Throw(DataImportHandlerException.java:71)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.
init(JdbcDataSource.java:253)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSou
rce.java:210)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSou
rce.java:38)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEn
tityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEnti
tyProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:243)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:465)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:404)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
ava:319)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:227)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
rter.java:422)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
ava:487)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBod
y(DataImportHandler.java:179)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:155)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
Handler.java:1307)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
:453)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
ava:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.jav
a:560)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl
er.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
er.java:1072)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
382)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle
r.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
r.java:1006)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
ava:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
extHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl
ection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac
tHttpConnection.java:485)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin
gHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(Abstra
ctHttpConnection.java:926)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.header
Complete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)

at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpCo
nnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(So
cketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo
l.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool
.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NullPointerException
at

Re: how to debug dataimporthandler

2013-12-30 Thread PeterKerk

Tried your steps, but failed. Could you perhaps have a look at my post here:
http://lucene.472066.n3.nabble.com/org-apache-solr-handler-dataimport-DataImportHandlerException-Unable-to-execute-query-td4108227.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-debug-dataimporthandler-tp2611506p4108676.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query

2013-12-30 Thread PeterKerk

This is all I see in the XML response:

This XML file does not appear to have any style information associated with
it. The document tree is shown below.
response
script/
lst name=responseHeader
int name=status0/int
int name=QTime39/int
/lst
lst name=initArgs
lst name=defaults
str name=configwordpress-data-config.xml/str
/lst
/lst
str name=commandfull-import/str
str name=modedebug/str
arr name=documents/
lst name=verbose-output/
str name=statusidle/str
str name=importResponse/
lst name=statusMessages
str name=Time Elapsed0:0:0.57/str
str name=Total Requests made to DataSource1/str
str name=Total Rows Fetched0/str
str name=Total Documents Processed0/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2013-12-30 12:21:49/str
str name=Indexing failed. Rolled back all changes./str
str name=Rolledback2013-12-30 12:21:49/str
/lst
str name=WARNING
This response format is experimental. It is likely to change in the future.
/str
/response



--
View this message in context: 
http://lucene.472066.n3.nabble.com/org-apache-solr-handler-dataimport-DataImportHandlerException-Unable-to-execute-query-tp4108227p4108682.html
Sent from the Solr - User mailing list archive at Nabble.com.

Disable caching on sorting to improve performance

2013-12-27 Thread PeterKerk

I'm getting a lot of java heap memory full errors. I've now been reading into
solr performance (in the meantime also configuring the sematext tools to try
to drill down to the cause)

I already increased the memory available to Solr:
bash -c cd /cygdrive/c/Databases/solr-4.3.1/example/;java
-Dsolr.solr.home=./example-DIH/solr/ -jar -Xmx200m -Xmx1200m start.jar 

And now I read:
Factors that affect memory usage:
http://stackoverflow.com/questions/1546898/how-to-reduce-solr-memory-usage
I see that sorting affects memory usage. I have the feeling that is the case
for me, because since I implemented sorting the memory errors are going
through the roof. 

I read here
http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters how to
do it on filter queries, but I was wondering how I can disable the caching
on the sort parameter in below statement where it now has
`sort=clickcount%20desc,prijs%20desc,updatedate%20desc`


searchquery.Append(fl=id,artikelnummer,titel,friendlyurl,pricerange,lang,currency,createdate)
searchquery.Append(facet.field=pricerange)
searchquery.Append(facet.mincount=1) 
searchquery.Append(facet.sort=index) 
searchquery.Append(omitHeader=true)
searchquery.Append(sort=clickcount%20desc,prijs%20desc,updatedate%20desc) 
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-caching-on-sorting-to-improve-performance-tp4108356.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Disable caching on sorting to improve performance

2013-12-27 Thread PeterKerk

Thanks and good call, that has been there for quite some time! 
I've changed it to: -Xms200m -Xmx1500m 
I'll look into the effect of this first.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-caching-on-sorting-to-improve-performance-tp4108356p4108362.html
Sent from the Solr - User mailing list archive at Nabble.com.

org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query

2013-12-26 Thread PeterKerk

I'm trying to setup Solr with a Wordpress database running on MySQL.

But on trying a full import:
`http://localhost:8983/solr/tv-wordpress/dataimport?command=full-import`


The error is:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query


**data-config.xml**

dataConfig
dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql@localhost:3306/wptalkman user=root password= /
document name=articles
entity pk=id name=article query=SELECT * FROM 
wp_posts WHERE
post_status='publish';
field name=id column=ID /
field name=post_title column=post_title /
field name=post_author column=post_author 
/   
/entity
/document
/dataConfig


I also tried including the database name in the SQL statement: 

SELECT * FROM wptalkman.wp_posts WHERE post_status='publish';

and change the connection url to `jdbc:mysql@localhost:3306`

But I'm still unable to execute the query. 


**console output**  

194278 [Thread-22] INFO  org.apache.solr.update.UpdateHandler  û start
rollback{
}
194279 [Thread-22] INFO  org.apache.solr.update.DefaultSolrCoreState  û
Creating
 new IndexWriter...
194279 [Thread-22] INFO  org.apache.solr.update.DefaultSolrCoreState  û
Waiting
until IndexWriter is unused... core=tv-wordpress
194280 [Thread-22] INFO  org.apache.solr.update.DefaultSolrCoreState  û
Rollback
 old IndexWriter... core=tv-wordpress
194282 [Thread-22] INFO  org.apache.solr.core.SolrCore  û
SolrDeletionPolicy.onI
nit: commits:num=1
   
commit{dir=NRTCachingDirectory(org.apache.lucene.store.SimpleFSDirectory
   
@C:\Dropbox\Databases\solr-4.3.1\example\example-DIH\solr\tv-wordpress\data\inde
x lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@10ff234;
maxCach
eMB=48.0
maxMergeSizeMB=4.0),segFN=segments_3l,generation=129,filenames=[_3o.nvd
, _3o_Lucene41_0.tim, _3o.fnm, _3o.nvm, _3o_Lucene41_0.tip, _3o.fdt,
_3o_Lucene4
1_0.pos, segments_3l, _3o.fdx, _3o_Lucene41_0.doc, _3o.si]
194283 [Thread-22] INFO  org.apache.solr.core.SolrCore  û newest commit
= 129[_3
o.nvd, _3o_Lucene41_0.tim, _3o.fnm, _3o.nvm, _3o_Lucene41_0.tip,
_3o.fdt, _3o_Lu
cene41_0.pos, segments_3l, _3o.fdx, _3o_Lucene41_0.doc, _3o.si]
194283 [Thread-22] INFO  org.apache.solr.update.DefaultSolrCoreState  û
New Inde
xWriter is ready to be used.
194283 [Thread-22] INFO  org.apache.solr.update.UpdateHandler  û
end_rollback
194669 [qtp32398134-13] INFO 
org.apache.solr.handler.dataimport.DataImporter  û
 Loading DIH Configuration: wordpress-data-config.xml
194672 [qtp32398134-13] INFO 
org.apache.solr.handler.dataimport.DataImporter  û
 Data Configuration loaded successfully
194676 [Thread-23] INFO  org.apache.solr.handler.dataimport.DataImporter 
û Star
ting Full Import
194676 [qtp32398134-13] INFO  org.apache.solr.core.SolrCore  û
[tv-wordpress] we
bapp=/solr path=/dataimport params={command=full-import} status=0
QTime=8
194680 [Thread-23] INFO 
org.apache.solr.handler.dataimport.SimplePropertiesWrit
er  û Read dataimport.properties
194681 [Thread-23] INFO  org.apache.solr.core.SolrCore  û [tv-wordpress]
REMOVIN
G ALL DOCUMENTS FROM INDEX
194686 [Thread-23] INFO 
org.apache.solr.handler.dataimport.JdbcDataSource  û Cr
eating a connection for entity article with URL:
jdbc:mysql@localhost:3306/wptal
kman
194686 [Thread-23] INFO 
org.apache.solr.handler.dataimport.JdbcDataSource  û Ti
me taken for getConnection(): 0
194687 [Thread-23] ERROR org.apache.solr.handler.dataimport.DocBuilder 
û Except
ion while processing: article document :
SolrInputDocument[]:org.apache.solr.han
dler.dataimport.DataImportHandlerException: Unable to execute query:
select * fr
om wp_posts WHERE post_status='publish' Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
Throw(DataImportHandlerException.java:71)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.
init(JdbcDataSource.java:253)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSou
rce.java:210)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSou
rce.java:38)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEn
tityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEnti
tyProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:243)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde

Re: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query

2013-12-26 Thread PeterKerk

Solr 4.3.1

When I run the statement in MySQL Workbench or console the statement
executes successfully and returns 2 results.

FYI: I placed the mysql-connector-java-5.1.27-bin.jar in the \lib folder.

Also: it should not throw this error even when 0 results are returned right?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/org-apache-solr-handler-dataimport-DataImportHandlerException-Unable-to-execute-query-tp4108227p4108233.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query

2013-12-26 Thread PeterKerk

Shalin Shekhar Mangar wrote
 Can you try using the debug mode and paste its response?

Ok, thanks. How do I enabled and use the debug mode?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/org-apache-solr-handler-dataimport-DataImportHandlerException-Unable-to-execute-query-tp4108227p4108248.html
Sent from the Solr - User mailing list archive at Nabble.com.

Displaying actual field values and searching lowercase ignoring spaces

2013-12-09 Thread PeterKerk

Values of the field [street] in my DB may be Castle Road

However, I want to be able to find these values using lowercase including
dashes, so castle-road would be a match.

When I use fieldtype text_lower_space, which holds a
solr.WhitespaceTokenizerFactory, the value is split in 2 values, Castle
and Road. 

When I use type string of fieldtype solr.StrField, I can not search
lowercase and still find values which hold uppercase characters, such as
Castle Road.

I need to be able to find values (regardless of their casing) using a
lowercase query.

I will be using the [street] field to display facets, so the text displayed
to the user should be the exact value including casing from field [street],
however, when I search on the field, castle-road should return a match.

original value  found on
Castle Road castle-road
Oak-tree lane   oak-tree-lane


The problem now is that I don't know which tokenizer I need to use, both for
index and query.


fieldType name=text_lower_space class=solr.TextField
positionIncrementGap=100
  analyzer type=index

tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query

tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/

filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Displaying-actual-field-values-and-searching-lowercase-ignoring-spaces-tp4105723.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: get min-max prices as facets

Thanks!
I know how to fire a range query. However, what I want is to provide the
visitor with a range filter. In this range filter the minimum and maximum
value are already set to the lowest and highest price of the current
resultset.

e.g.
I sell cars. My cheapest car is 1,000 and the most expensive is 100,000
A user comes on the site and all cars are shown, so the price range filter
he sees is set to start at 1,000 and end at 100,000

Then the user filters on brand Audi.

The cheapest Audi is 40,000 and the most expensive 80,000
In this case I the price range filter changes, and holds a minimum value of
40,000 and a max of 80,000. This will a) give the user an indication of all
Audi prices and b) prevent him from selecting a price for which there is no
car available.

I believe this is different then what you are suggesting right?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/get-min-max-prices-as-facets-tp4099501p4099565.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr sort facets by name

That works, thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-sort-facets-by-name-tp4099499p4099572.html
Sent from the Solr - User mailing list archive at Nabble.com.

Limit single field length in solr response via the request url

I'm requesting fields like so:

http://localhost:8983/solr/test/select/?indent=onfacet=truewt=jsonstart=0rows=20fl=
id,title,description,pricerange

However, the field description might be more than 4000 characters long, so I
want to limit it to a maximum of 100 characters and then cut it off.
I don't want this all the time but it has to be configurable via the URL.

Is this possible?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Limit-single-field-length-in-solr-response-via-the-request-url-tp4099597.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: get min-max prices as facets