DIH nested cached entities not working after upgrade

2013-07-21 Thread Zac Smith
I recently upgraded a solr index from 3.5 to 4.3.0. I'm now having trouble with 
the data import handler when using the CachedSqlEntityProcessor.

The first issue I found was that the 'where' option doesn't work anymore. 
Instead I am now using 'cacheKey' and 'cacheLookup'.

My next issue is that if any nested entities are used, the delta import does 
not process more than 2 documents.
e.g. (simplified from my actual import file)
entity name=books
pk=ID
query=select * from books
deltaImportQuery=select * from books where ID = 
${dih.delta.ID}
deltaQuery=select ID from Books where UpdateDate gt; 
'${dih.last_index_time}'
field column=ID name=id /
field column=ISBN13 name=isbn13 /
 ...
  entity name=book_authors
processor=CachedSqlEntityProcessor
query=
SELECT BookID, AuthorID
FROM BookAuthors
 cacheKey=BookID
 cacheLookup=books.ID
entity name=authors
processor=CachedSqlEntityProcessor
query=
SELECT ID, Title
FROM Authors
 cacheKey=ID
 cacheLookup=book_authors.AuthorID
field column=Title name=author_name /
field column=ID name=author_id /
/entity
/entity
/entity

Full imports run fine. But delta imports will show as having processed 2 
documents, and then will keep fetching more rows until it eventually runs out 
of memory. For some reason, no additional documents are processed. This was 
working fine in 3.x versions of SOLR (up to 3.5).

I'm aware that there have been some significant changes to caching in 
SOLR-2382, but don't think this scenario should be affected. It seems to be 
specifically when there is an entity using caching that contains a sub entity 
that is also using caching.


RE: DIH nested cached entities not working after upgrade

2013-07-21 Thread Zac Smith
Same problem with 4.4.0 RC1.

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Sunday, July 21, 2013 5:57 AM
To: solr-user@lucene.apache.org
Subject: Re: DIH nested cached entities not working after upgrade

Could you check with Solr 4.4 RC1:
http://people.apache.org/~sarowe/staging_area/lucene-solr-4.4.0-RC1-rev1504776/solr/?

There were some issues with nested keys ${a.b.c} due to the scoping mechanism 
implementation changes. Not a direct match, but might be easier to check this 
first than dig into deeper causes.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once. 
Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


RE: Keyword Tokenizer Phrase Issue

2012-02-12 Thread Zac Smith
I have come to the conclusion that this isn't possible due to the way dismax 
queries are created. I found someone else that had the exact same issue last 
year: 
http://lucene.472066.n3.nabble.com/Multi-word-exact-keyword-case-insensitive-search-suggestions-td2246516.html
I believe this makes it impossible to do exact matching on multi word terms 
with dismax.

So I have created two JIRA tickets that hopefully address the issue:
1) a suggested improvement to dismax specific to the KeywordTokenizerFactory: 
https://issues.apache.org/jira/browse/SOLR-3127
2) what I believe is a bug when removing terms from the query: 
https://issues.apache.org/jira/browse/SOLR-3128

Feedback welcome.

Thanks
Zac

-Original Message-
From: Zac Smith 
Sent: Friday, February 10, 2012 3:30 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: Keyword Tokenizer Phrase Issue

Thanks, that explains why the individual terms 'chicken' and 'stock' are still 
in the query (and are required).
So I have tried a few things to get around this, but to no avail:

Changed the query analyzer to use the WhitespaceTokenizerFactory with 
autoGeneratePhraseQueries=true. This creates the correct phrase query, but the 
dismax query still requires the individual terms to match ('chicken' and 
'stock'):
+(DisjunctionMaxQuery((ingredient_synonyms:chicken)~0.01) 
+DisjunctionMaxQuery((ingredient_synonyms:stock)~0.01)) 
+DisjunctionMaxQuery((ingredient_synonyms:chicken stock~100)~0.01)

So the next thing I have tried is to remove the individual terms during the 
query analysis. I did this using the ShingleFilterFactory, so my query analyzer 
now looks like this:
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /   

filter class=solr.ShingleFilterFactory outputUnigrams=false 
maxShingleSize=2 / /analyzer This leaves the single term 'chicken stock' 
in the query analysis and the dismax query is:
+() DisjunctionMaxQuery((ingredient_synonyms:chicken stock)~0.01)

Which looks OK except for the +(). It looks like it is requiring an empty 
clause.

This seems like a pretty simple requirement - to only have exact matches on 
multi word text. Am I missing something here?

Thanks
Zac



RE: Keyword Tokenizer Phrase Issue

2012-02-10 Thread Zac Smith
I have done some further analysis on this and I am now even more confused. When 
I use the Field Analysis tool with the text 'chicken stock' it highlights that 
text as a match.
The dismax query looks ok to me:
+(DisjunctionMaxQuery((ingredient_synonyms:chicken^0.6)~0.01) 
DisjunctionMaxQuery((ingredient_synonyms:stock^0.6)~0.01)) 
DisjunctionMaxQuery((ingredient_synonyms:chicken stock^0.6)~0.01)

Then I have done an explainOther and it shows a failure to meet condition. 
However there does seem to be some kind of match registered:
0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s)
  0.0 = no match on required clause (ingredient_synonyms:chicken^0.6 
ingredient_synonyms:stock^0.6)
  0.0650662 = (MATCH) weight(ingredient_synonyms:chicken stock^0.6 in 0), 
product of:
0.21204369 = queryWeight(ingredient_synonyms:chicken stock^0.6), product of:
  0.6 = boost
  0.30685282 = idf(docFreq=1, maxDocs=1)
  1.1517122 = queryNorm
0.30685282 = (MATCH) fieldWeight(ingredient_synonyms:chicken stock in 0), 
product of:
  1.0 = tf(termFreq(ingredient_synonyms:chicken stock)=1)
  0.30685282 = idf(docFreq=1, maxDocs=1)
  1.0 = fieldNorm(field=ingredient_synonyms, doc=0)

Any ideas?

My dismax handler is setup like this:
  requestHandler name=dismax class=solr.SearchHandler 
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qfingredient_synonyms^0.6/str
 str name=pfingredient_synonyms^0.6/str
/requestHandler

Zac

From: Zac Smith
Sent: Thursday, February 09, 2012 12:52 PM
To: solr-user@lucene.apache.org
Subject: Keyword Tokenizer Phrase Issue

Hi,

I have a simple field type that uses the KeywordTokenizerFactory. I would like 
to use this so that values in this field are only matched with the full text of 
the field.
e.g. If I indexed the text 'chicken stock', searches on this field would only 
match when searching for 'chicken stock'. If searching for just 'chicken' or 
just 'stock' there should not match.

This mostly works, except if there is more than one word in the text I only get 
a match when searching with quotes. e.g.
chicken stock (matches)
chicken stock (doesn't match)

Is there any way I can set this up so that I don't have to provide quotes? I am 
using dismax and if I put quotes in it will mess up the search for the rest of 
my fields. I had an idea that I could issue a separate search using the regular 
query parser, but couldn't work out how to do this:
I thought I could do something like this: qt=dismaxq=fish OR 
_query_:ingredient:chicken stock

I am using solr 3.5.0. My field type is:
fieldType name=keyword_test class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory 
/
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory 
/
/analyzer
/fieldType

Thanks
Zac


RE: Keyword Tokenizer Phrase Issue

2012-02-10 Thread Zac Smith
Thanks, that explains why the individual terms 'chicken' and 'stock' are still 
in the query (and are required).
So I have tried a few things to get around this, but to no avail:

Changed the query analyzer to use the WhitespaceTokenizerFactory with 
autoGeneratePhraseQueries=true. This creates the correct phrase query, but the 
dismax query still requires the individual terms to match ('chicken' and 
'stock'):
+(DisjunctionMaxQuery((ingredient_synonyms:chicken)~0.01) 
DisjunctionMaxQuery((ingredient_synonyms:stock)~0.01)) 
DisjunctionMaxQuery((ingredient_synonyms:chicken stock~100)~0.01)

So the next thing I have tried is to remove the individual terms during the 
query analysis. I did this using the ShingleFilterFactory, so my query analyzer 
now looks like this:
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /   

filter class=solr.ShingleFilterFactory outputUnigrams=false 
maxShingleSize=2 /
/analyzer
This leaves the single term 'chicken stock' in the query analysis and the 
dismax query is:
+() DisjunctionMaxQuery((ingredient_synonyms:chicken stock)~0.01)

Which looks OK except for the +(). It looks like it is requiring an empty 
clause.

This seems like a pretty simple requirement - to only have exact matches on 
multi word text. Am I missing something here?

Thanks
Zac


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Friday, February 10, 2012 1:50 AM
To: solr-user@lucene.apache.org
Subject: RE: Keyword Tokenizer Phrase Issue

Hi Zac,

Field Analysis tool (analysis.jsp) does not perform actual query parsing.

One thing to be aware of when Using Keyword Tokenizer at query time is: Query 
string (chicken stock) is pre-tokenized according to white spaces, before it 
reaches keyword tokenizer.

If you use quotes (chicken stock), query parser does no pre-tokenizes, though.

--- On Fri, 2/10/12, Zac Smith z...@trinkit.com wrote:

 From: Zac Smith z...@trinkit.com
 Subject: RE: Keyword Tokenizer Phrase Issue
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Friday, February 10, 2012, 10:35 AM I have done some further 
 analysis on this and I am now even more confused. When I use the Field 
 Analysis tool with the text 'chicken stock' it highlights that text as 
 a match.
 The dismax query looks ok to me:
 +(DisjunctionMaxQuery((ingredient_synonyms:chicken^0.6)~0.01)
 DisjunctionMaxQuery((ingredient_synonyms:stock^0.6)~0.01))
 DisjunctionMaxQuery((ingredient_synonyms:chicken
 stock^0.6)~0.01)
 
 Then I have done an explainOther and it shows a failure to meet 
 condition. However there does seem to be some kind of match 
 registered:
 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited 
 clause(s)
   0.0 = no match on required clause
 (ingredient_synonyms:chicken^0.6
 ingredient_synonyms:stock^0.6)
   0.0650662 = (MATCH)
 weight(ingredient_synonyms:chicken stock^0.6 in 0), product
 of:
     0.21204369 =
 queryWeight(ingredient_synonyms:chicken stock^0.6), product
 of:
       0.6 = boost
       0.30685282 = idf(docFreq=1, maxDocs=1)
       1.1517122 = queryNorm
     0.30685282 = (MATCH)
 fieldWeight(ingredient_synonyms:chicken stock in 0), product
 of:
       1.0 =
 tf(termFreq(ingredient_synonyms:chicken stock)=1)
       0.30685282 = idf(docFreq=1, maxDocs=1)
       1.0 =
 fieldNorm(field=ingredient_synonyms, doc=0)
 
 Any ideas?
 
 My dismax handler is setup like this:
   requestHandler name=dismax
 class=solr.SearchHandler 
     lst name=defaults
      str
 name=defTypedismax/str
      str
 name=echoParamsexplicit/str
      float
 name=tie0.01/float
      str
 name=qfingredient_synonyms^0.6/str
      str
 name=pfingredient_synonyms^0.6/str
 /requestHandler
 
 Zac
 
 From: Zac Smith
 Sent: Thursday, February 09, 2012 12:52 PM
 To: solr-user@lucene.apache.org
 Subject: Keyword Tokenizer Phrase Issue
 
 Hi,
 
 I have a simple field type that uses the KeywordTokenizerFactory. I 
 would like to use this so that values in this field are only matched 
 with the full text of the field.
 e.g. If I indexed the text 'chicken stock', searches on this field 
 would only match when searching for 'chicken stock'.
 If searching for just 'chicken' or just 'stock' there should not 
 match.
 
 This mostly works, except if there is more than one word in the text I 
 only get a match when searching with quotes.
 e.g.
 chicken stock (matches)
 chicken stock (doesn't match)
 
 Is there any way I can set this up so that I don't have to provide 
 quotes? I am using dismax and if I put quotes in it will mess up the 
 search for the rest of my fields. I had an idea that I could issue a 
 separate search using the regular query parser, but couldn't work out 
 how to do this:
 I thought I could do something like this:
 qt=dismaxq=fish OR _query_:ingredient:chicken stock
 
 I am using solr 3.5.0. My field type is:
 fieldType name=keyword_test class=solr.TextField
 positionIncrementGap=100

Keyword Tokenizer Phrase Issue

2012-02-09 Thread Zac Smith
Hi,

I have a simple field type that uses the KeywordTokenizerFactory. I would like 
to use this so that values in this field are only matched with the full text of 
the field.
e.g. If I indexed the text 'chicken stock', searches on this field would only 
match when searching for 'chicken stock'. If searching for just 'chicken' or 
just 'stock' there should not match.

This mostly works, except if there is more than one word in the text I only get 
a match when searching with quotes. e.g.
chicken stock (matches)
chicken stock (doesn't match)

Is there any way I can set this up so that I don't have to provide quotes? I am 
using dismax and if I put quotes in it will mess up the search for the rest of 
my fields. I had an idea that I could issue a separate search using the regular 
query parser, but couldn't work out how to do this:
I thought I could do something like this: qt=dismaxq=fish OR 
_query_:ingredient:chicken stock

I am using solr 3.5.0. My field type is:
fieldType name=keyword_test class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory 
/
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory 
/
/analyzer
/fieldType

Thanks
Zac


RE: Multi word synonyms

2012-02-07 Thread Zac Smith
I suppose I could translate every user query to include the term with quotes.

e.g. if someone searches for stock syrup I send a query like:
q=stock syrup OR stock syrup

Seems like a bit of a hack though, is there a better way of doing this?

Zac

-Original Message-
From: Zac Smith 
Sent: Sunday, February 05, 2012 7:28 PM
To: solr-user@lucene.apache.org
Subject: RE: Multi word synonyms

Thanks for the response. This almost worked, I created a new field using the 
KeywordTokenizerFactory as you suggested. The only problem was that searches 
only found documents when quotes were used. 
E.g. 
synonyms.txt setup like this:
simple syrup,sugar syrup,stock syrup

I indexed a document with the value 'simple syrup'. Searches only found the 
document when using quotes:
e.g.
simple syrup or stock syrup matched
simple syrup (no quotes) did not match

Here is the field I created:
fieldType name=synonym_searcher class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt /  
tokenizer class=solr.KeywordTokenizerFactory 
/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true 
tokenizerFactory=solr.KeywordTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /  

/analyzer
analyzer type=query
charFilter 
class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt /
tokenizer class=solr.KeywordTokenizerFactory /  

filter class=solr.LowerCaseFilterFactory /  

/analyzer
/fieldType

Any ideas? Also, I am using dismax and solr 3.5.0.

Thanks
Zac

-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Sunday, February 05, 2012 5:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonyms

Your query analyser will tokenize simple sirup into simple and sirup
and wont match on simple syrup in the synonyms.txt

So you have to change the query analyzer into KeywordTokenizerFactory as well.

It might be idea to make a field for synonyms only with this tokenizer and 
another field to search on and use dismax. Never tried this though.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3717215.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Multi word synonyms

2012-02-07 Thread Zac Smith
It doesn't seem to do it for me. My field type is:
fieldType name=synonym_searcher class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory 
/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true 
tokenizerFactory=solr.KeywordTokenizerFactory /
/analyzer
analyzer type=query 
tokenizer class=solr.KeywordTokenizerFactory /  
  
/analyzer
/fieldType

I am using edismax and solr 3.5 and multi word values can only be matched when 
using quotes.

-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 07, 2012 12:49 PM
To: solr-user@lucene.apache.org
Subject: RE: Multi word synonyms

Isn't that what autoGeneratePhraseQueries=true is for?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3723886.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Multi word synonyms

2012-02-07 Thread Zac Smith
Are you able to explain how I would create another field to fit my scenario?

-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 07, 2012 1:28 PM
To: solr-user@lucene.apache.org
Subject: RE: Multi word synonyms

Well, if you want both multi word and single words I guess you will have to 
create another field :) Or make queries like you suggested.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3724009.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Multi word synonyms

2012-02-05 Thread Zac Smith
Thanks for the response. This almost worked, I created a new field using the 
KeywordTokenizerFactory as you suggested. The only problem was that searches 
only found documents when quotes were used. 
E.g. 
synonyms.txt setup like this:
simple syrup,sugar syrup,stock syrup

I indexed a document with the value 'simple syrup'. Searches only found the 
document when using quotes:
e.g.
simple syrup or stock syrup matched
simple syrup (no quotes) did not match

Here is the field I created:
fieldType name=synonym_searcher class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt /  
tokenizer class=solr.KeywordTokenizerFactory 
/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true 
tokenizerFactory=solr.KeywordTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /  

/analyzer
analyzer type=query
charFilter 
class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt /
tokenizer class=solr.KeywordTokenizerFactory /  

filter class=solr.LowerCaseFilterFactory /  

/analyzer
/fieldType

Any ideas? Also, I am using dismax and solr 3.5.0.

Thanks
Zac

-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Sunday, February 05, 2012 5:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonyms

Your query analyser will tokenize simple sirup into simple and sirup
and wont match on simple syrup in the synonyms.txt

So you have to change the query analyzer into KeywordTokenizerFactory as well.

It might be idea to make a field for synonyms only with this tokenizer and 
another field to search on and use dismax. Never tried this though.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3717215.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Multi word synonyms

2012-02-05 Thread Zac Smith
Thanks for your response. When I don't include the KeywordTokenizerFactory in 
the SynonymFilter definition, I get additional term values that I don't want.

e.g. synonyms.txt looks like:
simple syrup,sugar syrup,stock syrup

A document with a value containing 'simple syrup' can now be found when 
searching for just 'stock'.

So the problem I am trying to address with KeywordTokenizerFactory, is to 
prevent my multi word synonyms from getting broken down into single words.

Thanks
Zac

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Sunday, February 05, 2012 8:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonyms

I'm not quite sure what you're trying to do with KeywordTokenizerFactory in 
your SynonymFilter definition, but if I use the defaults, then the all-phrase 
form works just fine.

So the question is what problem are you trying to address by using 
KeywordTokenizerFactory?

Best
Erick

On Sun, Feb 5, 2012 at 8:21 AM, O. Klein kl...@octoweb.nl wrote:
 Your query analyser will tokenize simple sirup into simple and sirup
 and wont match on simple syrup in the synonyms.txt

 So you have to change the query analyzer into KeywordTokenizerFactory 
 as well.

 It might be idea to make a field for synonyms only with this tokenizer 
 and another field to search on and use dismax. Never tried this though.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p37172
 15.html Sent from the Solr - User mailing list archive at Nabble.com.




Multi word synonyms

2012-02-04 Thread Zac Smith
Hi

I have seen several questions on this already but haven't been able to sort my 
issue. My problem is that multi-word synonyms aren't behaving as I would 
expect. I have copied my field type definition at the bottom of this message, 
but the relevant synonym filter is here (used at index time):
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory 
/

Say I have synonyms.txt setup like this:
syrup,sugar syrup,stock syrup

When indexing the text 'syrup', the 3 phrases are treated equivalently as 
expected. I can see this in the Index Analyzer as they all occupy the same term 
position.

But if all of the synonyms are a phrase, it doesn't work. 
e.g. synonyms.txt looks like:
simple syrup,sugar syrup,stock syrup

Now when putting the text 'simple syrup' into the Index Analyzer I can only see 
the original term listed. It is not finding the synonyms.

Anyone know how to fix this?

Zac

Field Type definition:
fieldType name=phrase_searcher class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt /  
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true 
tokenizerFactory=solr.KeywordTokenizerFactory /
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt /
filter class=solr.PorterStemFilterFactory /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer
analyzer type=query
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt /
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt /
filter class=solr.PorterStemFilterFactory /
/analyzer
/fieldType



How to specify dismax boost based on rating

2011-12-03 Thread Zac Smith
Hi,

I think this is a pretty common requirement so hoping someone can easily point 
out the solution:

I have an average rating field defined in my schema that is a tdouble and can 
be anything from 0 - 5 (including decimals). I am using dismax so I want to 
define a boost based on the average rating. So the higher the number, the more 
it gets boosted. I am not sure how to specify the boost in my request handler. 
The examples I have found show something like this:
 str name=bq
rating:1^1.0 rating:2^2.0 rating:3^3.0 rating:4^4.0 rating:5^5.0
/str

But that seems to assume I would be using whole numbers. I need my rating to 
take into account decimal values as well.

Any pointers?


RE: newbie question for DataImportHandler

2011-05-24 Thread Zac Smith
Sounds like you might not be committing the delete. How are you deleting it?
If you run the data import handler with clean=true (which is the default) it 
will delete the data for you anyway so you don't need to delete it yourself.

Hope that helps.

-Original Message-
From: antoniosi [mailto:antonio...@gmail.com] 
Sent: Tuesday, May 24, 2011 4:43 PM
To: solr-user@lucene.apache.org
Subject: newbie question for DataImportHandler

Hi,

I am new to Solr; apologize in advance if this is a stupid question.

I have created a simple database, with only 1 table with 3 columns, id, name, 
and last_update fields.

I populate the database with 1 million test rows.
I run solr, go to the data import handler development console and do a full 
import. I use the Luke tool to look at the content of the lucene index.

This all works fine so far.

I remove all the 1 million rows from my table and populate the table with 
another million rows of data.
I remove the index that solr previously create. I restart solr and go to the 
data import handler development console and do the full import again.

I use the Luke tool to look at the content of the lucene index. However, I am 
seeing the old data in my new index.

Doe Solr keeps a cached copy of the index somewhere?

I hope I have described my problem clearly.

Thanks in advance.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/newbie-question-for-DataImportHandler-tp2982277p2982277.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Spatial Solr 3.1: filter by viewport

2011-05-23 Thread Zac Smith
It looks like someone asked this question a few months ago and didn't get an 
answer either ... 
http://lucene.472066.n3.nabble.com/Spatial-Solr-Representing-a-bounding-box-and-searching-for-it-tc2447262.html#none

I really thought this would be a pretty simple question to answer? Is there no 
way to specify the exact coordinates of the bounding box - 
http://wiki.apache.org/solr/SpatialSearch#bbox_-_Bounding-box_filter ??


Zac

-Original Message-
From: Zac Smith [mailto:z...@trinkit.com] 
Sent: Sunday, May 22, 2011 9:34 PM
To: solr-user@lucene.apache.org
Subject: Spatial Solr 3.1: filter by viewport

How would I specify a filter that covered a rectangular viewport? I have 4 
coordinate points for the corners and I want to return everything inside that 
area.
My first naive attempt was this:
q=*:*fq=coords:[44.119141,-125.948638 TO 47.931066,-111.029205]

At first this seems to work OK, except where the viewport crosses over a point 
where the longitude goes from a positive value to a negative value.

Thanks
Zac


Spatial Solr 3.1: filter by viewport

2011-05-22 Thread Zac Smith
How would I specify a filter that covered a rectangular viewport? I have 4 
coordinate points for the corners and I want to return everything inside that 
area.
My first naive attempt was this:
q=*:*fq=coords:[44.119141,-125.948638 TO 47.931066,-111.029205]

At first this seems to work OK, except where the viewport crosses over a point 
where the longitude goes from a positive value to a negative value.

Thanks
Zac


RE: Schema Design Question

2011-05-15 Thread Zac Smith
Ok thanks for the responses. My option #2 will be easier to implement than 
having the new doc with combinations so will give it a try. But that has opened 
my eyes to different possibilities!

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Sunday, May 15, 2011 8:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Schema Design Question

Of your first two options, I'd go with a multi-valued field for each book (1).

But kenf_nc's suggestion is a good one too.

On Sun, May 15, 2011 at 3:54 AM, kenf_nc ken.fos...@realestate.com wrote:
 create a separate document for each book-bookshelf combination.
 doc 1 = book 1,shelf 1
 doc 2 = book 1,shelf 3
 doc 3 = book 2,shelf 1
 etc.

 then your queries are q=book_id   to get all bookshelfs a given book 
 is on or q=shelf_id to get all books on a given bookshelf.

 Biggest problem people face with Solr schema design is thinking either 
 object orientedly or RDBMs orientedly. You need to think differently.
 Solr/Lucene find text and they find it very fast over huge amounts of data.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Schema-Design-Question-tp2939045p29
 42809.html Sent from the Solr - User mailing list archive at 
 Nabble.com.



RE: Schema Design Question

2011-05-14 Thread Zac Smith
Thanks that looks interesting. Don't think it helps my situation though as I 
would have to index all the bookshelves and will still end up having to put 
thousands of Book ID values in a multi-value field.

I guess the question I have is: Is it more appropriate to load a multi-value 
field with a large number of values or should you pass a large number of values 
in as a Boolean clause?

Zac

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Friday, May 13, 2011 10:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Schema Design Question

Hi Zac,

Solr 4.0 (trunk) has support for relationships/JOIN.  Have a look: 
http://search-lucene.com/?q=solr+join

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem 
search :: http://search-lucene.com/



- Original Message 
 From: Zac Smith z...@trinkit.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Fri, May 13, 2011 12:28:35 PM
 Subject: Schema Design Question
 
 Let's say I have a data model that involves books and bookshelves. I 
have tens of thousands of books and thousands of bookshelves. There is 
a many-many relationship between books  bookshelves. All of the books are 
indexed by  SOLR.
 
 I need to be able to query SOLR and get all the books for a given  
bookshelf. I see two schema design options here:
 
 
 1)   Each book has a multi-value field that contains a list of all the  
bookshelf ID's. Many books will have thousands of bookshelf ID's. In 
this case the query is simple, I just send solr the bookshelf ID.
 
 2)   I send solr a query with each book on the bookshelf e.g.  
q=book_id:(1+OR+2+OR+3 ). Many bookshelves will have thousands of 
book ID's so the query can get rather large.
 
 Right now I am using option 2 and it  seems to be working fine. I have 
had to crank 'maxBooleanClauses' right up but  it does seem to be pretty fast.
 
 Anyone have an opinion?
 
 


Schema Design Question

2011-05-13 Thread Zac Smith
Let's say I have a data model that involves books and bookshelves. I have tens 
of thousands of books and thousands of bookshelves. There is a many-many 
relationship between books  bookshelves. All of the books are indexed by SOLR.

I need to be able to query SOLR and get all the books for a given bookshelf. I 
see two schema design options here:


1)  Each book has a multi-value field that contains a list of all the 
bookshelf ID's. Many books will have thousands of bookshelf ID's. In this case 
the query is simple, I just send solr the bookshelf ID.

2)  I send solr a query with each book on the bookshelf e.g. 
q=book_id:(1+OR+2+OR+3 ). Many bookshelves will have thousands of book ID's 
so the query can get rather large.

Right now I am using option 2 and it seems to be working fine. I have had to 
crank 'maxBooleanClauses' right up but it does seem to be pretty fast.

Anyone have an opinion?



DIH CachedSqlEntityProcessor null exception

2011-04-13 Thread Zac Smith
I have come across an issue with the DIH where I get a null exception when 
pre-caching entities. I expect my entity to have null values so this is a bit 
of a roadblock for me. The issue was described more succinctly in this 
discussion: 
http://lucene.472066.n3.nabble.com/DataImportHandlerException-when-cache-key-is-null-in-SOLR-1-4-1-td2003059.html

Anyone know anything about this?



RE: Using the Data Import Handler with SQLite

2011-04-04 Thread Zac Smith
I was able to resolve this issue by using a different jdbc driver: 
http://www.xerial.org/trac/Xerial/wiki/SQLiteJDBC


-Original Message-
From: Zac Smith [mailto:z...@trinkit.com] 
Sent: Friday, April 01, 2011 5:56 PM
To: solr-user@lucene.apache.org
Subject: Using the Data Import Handler with SQLite

I hope this question is being directed to the right place ...

I am trying to use SQLite (v3) as a source for the Data Import Handler. I am 
using a sqllite jdbc driver (link below) and this works when using with only 
one entity. As soon as I add a sub-entity it falls over with a locked DB error: 
java.sql.SQLException: database is locked.
Now I realize that you can only have one connection open to SQLite at a time. 
So I assume that the first query is leaving a connection open before it moves 
onto the sub-query. I am not sure if the issue would be in the jdbc driver or 
the DIH. It works fine with SQL Server.

Is this a bug? Or something that just isn't possible with SQLite?

Here is a sample of my data config file:
dataConfig
  dataSource type=JdbcDataSource 
  driver=org.sqlite.JDBC
  url=jdbc:sqlite:SolrImportTest.db /
  document
entity name=locations
pk=id
query=select * from locations
field column=Id name=Id /
field column=Name name=Name / 
field column=RegionId name=RegionId /
entity name=regions
pk=id
query=select * from regions where id = 
'${locations.RegionId}'
field column=Name name=RegionName /
/entity
/entity
  /document
/dataConfig

sqllite jdbc driver : http://www.zentus.com/sqlitejdbc/


Using the Data Import Handler with SQLite

2011-04-01 Thread Zac Smith
I hope this question is being directed to the right place ...

I am trying to use SQLite (v3) as a source for the Data Import Handler. I am 
using a sqllite jdbc driver (link below) and this works when using with only 
one entity. As soon as I add a sub-entity it falls over with a locked DB error: 
java.sql.SQLException: database is locked.
Now I realize that you can only have one connection open to SQLite at a time. 
So I assume that the first query is leaving a connection open before it moves 
onto the sub-query. I am not sure if the issue would be in the jdbc driver or 
the DIH. It works fine with SQL Server.

Is this a bug? Or something that just isn't possible with SQLite?

Here is a sample of my data config file:
dataConfig
  dataSource type=JdbcDataSource 
  driver=org.sqlite.JDBC
  url=jdbc:sqlite:SolrImportTest.db /
  document
entity name=locations
pk=id
query=select * from locations
field column=Id name=Id /
field column=Name name=Name / 
field column=RegionId name=RegionId /
entity name=regions
pk=id
query=select * from regions where id = 
'${locations.RegionId}'
field column=Name name=RegionName /
/entity
/entity
  /document
/dataConfig

sqllite jdbc driver : http://www.zentus.com/sqlitejdbc/