from:"Brian Lamb"

PHP/Solr library

2012-01-04 Thread Brian Lamb

Hi all,

I've been exploring http://www.php.net/manual/en/book.solr.php as a way to
maintain my index. I already have a PHP script that I use to update a
database so I was hoping to be able to update the database at the same time
I am updating the index.

However, I've been getting the following error when trying to run
$solr_client-commit();

Unsuccessful update request. Response Code 0. (null)

I've tried looking to see why I'm getting the error but I cannot find a
reasonable explanation. My guess is that it is because my index is rather
large (22 million records) and thus it is timing out or something like that
but I cannot confirm that that is the case nor do I know how to fix it even
if it were.

Any help here would be greatly appreciated.

Thanks,

Brian Lamb

Re: MySQL data import

2011-12-12 Thread Brian Lamb

Hi all,

Any tips on this one?

Thanks,

Brian Lamb

On Sun, Dec 11, 2011 at 3:54 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Hi all,

 I have a few questions about how the MySQL data import works. It seems it
 creates a separate connection for each entity I create. Is there any way to
 avoid this?

 By nature of my schema, I have several multivalued fields. Each one I
 populate with a separate entity. Is there a better way to do it? For
 example, could I pull in all the singular data in one sitting and then come
 back in later and populate with the multivalued items.

 An alternate approach in some cases would be to do a GROUP_CONCAT and then
 populate the multivalued column with some transformation. Is that possible?

 Lastly, is it possible to use copyField to copy three regular fields into
 one multiValued field and have all the data show up?

 Thanks,

 Brian Lamb

URLDataSource delta import

2011-12-12 Thread Brian Lamb

Hi all,

According to
http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource
a
delta-import is not currently implemented for URLDataSource. I say
currently because I've noticed that such documentation is out of date in
many places. I wanted to see if this feature had been added yet or if there
were plans to do so.

Thanks,

Brian Lamb

Re: MySQL data import

2011-12-12 Thread Brian Lamb

Thanks all. Erick, is there documentation on doing things with SolrJ and a
JDBC connection?

On Mon, Dec 12, 2011 at 1:34 PM, Erick Erickson erickerick...@gmail.comwrote:

 You might want to consider just doing the whole
 thing in SolrJ with a JDBC connection. When things
 get complex, it's sometimes more straightforward.

 Best
 Erick...

 P.S. Yes, it's pretty standard to have a single
 field be the destination for several copyField
 directives.

 On Mon, Dec 12, 2011 at 12:48 PM, Gora Mohanty g...@mimirtech.com wrote:
  On Mon, Dec 12, 2011 at 2:24 AM, Brian Lamb
  brian.l...@journalexperts.com wrote:
  Hi all,
 
  I have a few questions about how the MySQL data import works. It seems
 it
  creates a separate connection for each entity I create. Is there any
 way to
  avoid this?
 
  Not sure, but I do not think that it is possible. However, from your
 description
  below, I think that you are unnecessarily multiplying entities.
 
  By nature of my schema, I have several multivalued fields. Each one I
  populate with a separate entity. Is there a better way to do it? For
  example, could I pull in all the singular data in one sitting and then
 come
  back in later and populate with the multivalued items.
 
  Not quite sure as to what you mean. Would it be possible for you
  to post your schema.xml, and the DIH configuration file? Preferably,
  put these on pastebin.com, and send us links. Also, you should
  obfuscate details like access passwords.
 
  An alternate approach in some cases would be to do a GROUP_CONCAT and
 then
  populate the multivalued column with some transformation. Is that
 possible?
  [...]
 
  This is how we have been handling it. A complete description would
  be long, but here is the gist of it:
  * A transformer will be needed. In this case, we found it easiest
   to use a Java-based transformer. Thus, your entity should include
   something like
   entity name=myname dataSource=mysource
  transformer=com.mycompany.search.solr.handler.JobsNumericTransformer...
   ...
   /entity
   Here, the class name to be used for the transformer attribute follows
   the usual Java rules, and the .jar needs to be made available to Solr.
  * The SELECT statement for the entity looks something like
   select group_concat( myfield SEPARATOR '@||@')...
   The separator should be something that does not occur in your
   normal data stream.
  * Within the entity, define
field column=myfield/
  * There are complications involved if NULL values are allowed
for the field, in which case you would need to use COALESCE,
maybe along with CAST
  * The transformer would look up myfield, split along the separator,
and populate the multi-valued field.
 
  This *is* a little complicated, so I would also like to hear about
  possible alternatives.
 
  Regards,
  Gora

MySQL data import

2011-12-11 Thread Brian Lamb

Hi all,

I have a few questions about how the MySQL data import works. It seems it
creates a separate connection for each entity I create. Is there any way to
avoid this?

By nature of my schema, I have several multivalued fields. Each one I
populate with a separate entity. Is there a better way to do it? For
example, could I pull in all the singular data in one sitting and then come
back in later and populate with the multivalued items.

An alternate approach in some cases would be to do a GROUP_CONCAT and then
populate the multivalued column with some transformation. Is that possible?

Lastly, is it possible to use copyField to copy three regular fields into
one multiValued field and have all the data show up?

Thanks,

Brian Lamb

Re: Boosting is slow

2011-11-18 Thread Brian Lamb

Any ideas on this one?

On Thu, Nov 17, 2011 at 3:53 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Sorry, the query is actually:

 http://localhost:8983/solr/mycore/search/?q=test{!boost
 b=product(sum(log(sum(myfield,1)),1),recip(ms(NOW,mydate_field),3.16e-11,1,8))}start=sort=score+desc,mydate_field+descwt=xslttr=mysite.xsl


 On Thu, Nov 17, 2011 at 2:59 PM, Brian Lamb brian.l...@journalexperts.com
  wrote:

 Hi all,

 I have about 20 million records in my solr index. I'm running into a
 problem now where doing a boost drastically slows down my search
 application. A typical query for me looks something like:

 http://localhost:8983/solr/mycore/search/?q=test {!boost
 b=product(sum(log(sum(myfield,1)),1),recip(ms(NOW,mydate_field),3.16e-11,1,8))}

 I've tried several variations on the boost to see if that was the problem
 but even when doing something simple like:

 http://localhost:8983/solr/mycore/search/?q=test {!boost b=2}

 it is still really slow. Is there a different approach I should be taking?

 Thanks,

 Brian Lamb

Boosting is slow

2011-11-17 Thread Brian Lamb

Hi all,

I have about 20 million records in my solr index. I'm running into a
problem now where doing a boost drastically slows down my search
application. A typical query for me looks something like:

http://localhost:8983/solr/mycore/search/?q=test {!boost
b=product(sum(log(sum(myfield,1)),1),recip(ms(NOW,mydate_field),3.16e-11,1,8))}

I've tried several variations on the boost to see if that was the problem
but even when doing something simple like:

http://localhost:8983/solr/mycore/search/?q=test {!boost b=2}

it is still really slow. Is there a different approach I should be taking?

Thanks,

Brian Lamb

Re: Boosting is slow

2011-11-17 Thread Brian Lamb

Sorry, the query is actually:

http://localhost:8983/solr/mycore/search/?q=test{!boost
b=product(sum(log(sum(myfield,1)),1),recip(ms(NOW,mydate_field),3.16e-11,1,8))}start=sort=score+desc,mydate_field+descwt=xslttr=mysite.xsl

On Thu, Nov 17, 2011 at 2:59 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Hi all,

 I have about 20 million records in my solr index. I'm running into a
 problem now where doing a boost drastically slows down my search
 application. A typical query for me looks something like:

 http://localhost:8983/solr/mycore/search/?q=test {!boost
 b=product(sum(log(sum(myfield,1)),1),recip(ms(NOW,mydate_field),3.16e-11,1,8))}

 I've tried several variations on the boost to see if that was the problem
 but even when doing something simple like:

 http://localhost:8983/solr/mycore/search/?q=test {!boost b=2}

 it is still really slow. Is there a different approach I should be taking?

 Thanks,

 Brian Lamb

Autocomplete

2011-09-01 Thread Brian Lamb

Hi all,

I've read numerous guides on how to set up autocomplete on solr and it works
great the way I have it now. However, my only complaint is that it only
matches the beginning of the word. For example, if I try to autocomplete
dober, I would only get, Doberman, Doberman Pincher but not Pincher,
Doberman. Here is how my schema is configured:

fieldType name=edgytext class=solr.TextField
positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=25 /
   /analyzer
   analyzer type=query
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
/fieldType

field name=autocomplete_text type=edgytext indexed=true stored=true
omitNorms=true omitTermFreqAndPositions=true /

How can I update my autocomplete so that it will match the middle of a word
as well as the beginning of the word?

Thanks,

Brian Lamb

Re: Autocomplete

2011-09-01 Thread Brian Lamb

I found that if I change

filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25
/

to

filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=25 /

I can do autocomplete in the middle of a term.

Thanks!

Brian Lamb

On Thu, Sep 1, 2011 at 11:27 AM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Hi all,

 I've read numerous guides on how to set up autocomplete on solr and it
 works great the way I have it now. However, my only complaint is that it
 only matches the beginning of the word. For example, if I try to
 autocomplete dober, I would only get, Doberman, Doberman Pincher but
 not Pincher, Doberman. Here is how my schema is configured:

 fieldType name=edgytext class=solr.TextField
 positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=25 /
/analyzer
analyzer type=query
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
/analyzer
 /fieldType

 field name=autocomplete_text type=edgytext indexed=true
 stored=true omitNorms=true omitTermFreqAndPositions=true /

 How can I update my autocomplete so that it will match the middle of a word
 as well as the beginning of the word?

 Thanks,

 Brian Lamb

Re: Exact match not the first result returned

2011-07-29 Thread Brian Lamb

I implemented both solutions Hoss suggested and was able to achieve the
desired results. I would like to go with

 defType=dismax  qf=myname  pf=myname_str^100  q=Frank

but that doesn't seem to work if I have a query like myname:Frank
otherfield:something. So I think I will go with

q=+myname:Frank myname_str:Frank^100

Thanks for the help everyone!

Brian Lamb

On Wed, Jul 27, 2011 at 10:55 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : With your solution, RECORD 1 does appear at the top but I think thats
 just
 : blind luck more than anything else because RECORD 3 shows as having the
 same
 : score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd
 : like all three records returned with RECORD 1 being the first listing.

 with omitNorms RECORD1 and RECORD3 have the same score because only the
 tf() matters, and both docs contain the term frank exactly twice.

 the reason RECORD1 isn't scoring higher even though it contains (as you
 put it matchings 'Fred' exactly is that from a term perspective, RECORD1
 doesn't actually match myname:Fred exactly, because there are in fact
 other terms in that field because it's multivalued.

 one way to indicate that you (only* want documents where entire field
 values to match your input (ie: RECORD1 but no other records) would be to
 use a StrField instead of a TextField or an analyzer that doesn't split up
 tokens (lie: something using KeywordTokenizer).  that way a query on
 myname:Frank would not match a document where you had indexed the value
 Frank Stalone by a query for myname:Frank Stalone would.

 in your case, you don't want *only* the exact field value matches, but you
 want them boosted, so you could do something like copyField myname into
 myname_str and then do...

  q=+myname:Frank myname_str:Frank^100

 ...in which case a match on myname is required, but a match on
 myname_str will greatly increase the score.

 dismax (and edismax) are really designed for situations like this...

  defType=dismax  qf=myname  pf=myname_str^100  q=Frank



 -Hoss

Re: Exact match not the first result returned

2011-07-28 Thread Brian Lamb

That's a clever idea. I'll put something together and see how it turns out.
Thanks for the tip.

On Wed, Jul 27, 2011 at 10:55 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : With your solution, RECORD 1 does appear at the top but I think thats
 just
 : blind luck more than anything else because RECORD 3 shows as having the
 same
 : score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd
 : like all three records returned with RECORD 1 being the first listing.

 with omitNorms RECORD1 and RECORD3 have the same score because only the
 tf() matters, and both docs contain the term frank exactly twice.

 the reason RECORD1 isn't scoring higher even though it contains (as you
 put it matchings 'Fred' exactly is that from a term perspective, RECORD1
 doesn't actually match myname:Fred exactly, because there are in fact
 other terms in that field because it's multivalued.

 one way to indicate that you (only* want documents where entire field
 values to match your input (ie: RECORD1 but no other records) would be to
 use a StrField instead of a TextField or an analyzer that doesn't split up
 tokens (lie: something using KeywordTokenizer).  that way a query on
 myname:Frank would not match a document where you had indexed the value
 Frank Stalone by a query for myname:Frank Stalone would.

 in your case, you don't want *only* the exact field value matches, but you
 want them boosted, so you could do something like copyField myname into
 myname_str and then do...

  q=+myname:Frank myname_str:Frank^100

 ...in which case a match on myname is required, but a match on
 myname_str will greatly increase the score.

 dismax (and edismax) are really designed for situations like this...

  defType=dismax  qf=myname  pf=myname_str^100  q=Frank



 -Hoss

Re: Exact match not the first result returned

2011-07-27 Thread Brian Lamb

Thanks Emmanuel for that explanation. I implemented your solution but I'm
not quite there yet. Suppose I also have a record:

RECORD 3
arr name=myname
  strFred G. Anderson/str
  strFred Anderson/str
/arr

With your solution, RECORD 1 does appear at the top but I think thats just
blind luck more than anything else because RECORD 3 shows as having the same
score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd
like all three records returned with RECORD 1 being the first listing.

Thanks,

Brian Lamb

On Tue, Jul 26, 2011 at 6:03 PM, Emmanuel Espina
espinaemman...@gmail.comwrote:

 That is caused by the size of the documents. The principle is pretty
 intuitive if one of your documents is the entire three volumes of The Lord
 of the Rings, and you search for tree I know that The Lord of the Rings
 will be in the results, and I haven't memorized the entire text of that
 book
 :p
 It is a matter of probability that if you have a big (big!) text any word
 will have a greater chance to be found than in a smaller letter. So one can
 infer that the letter is more relevant than the big text. That is the
 principle applied here and Lucene does that when building the ranking.
 The first document is bigger (remember that all the values of a multivalued
 field are merged into one field in the index, so you can not tell one value
 from another apart) than the second one. In the first one you have
 [Fred, coolest,
 guy, town] and in the second [Fred, Anderson], so the second document is
 more relevant than the first one.

 To avoid all this procedure you can set omitNorms to true and that should
 make the first document more relevant because Fred appears twice (not
 because Fred appears alone in a value)

 Regards
 Emmanuel

 2011/7/26 Brian Lamb brian.l...@journalexperts.com

  Hi all,
 
  I am a little confused as to why the scoring is working the way it is:
 
  I have a field defined as:
 
  field name=myname type=text indexed=true stored=true
  required=false multivalued=true /
 
  And I have several documents where that value is:
 
  RECORD 1
  arr name=myname
   strFred/str
   strFred (the coolest guy in town)/str
  /arr
 
  OR
 
  RECORD 2
  arr name=myname
   strFred Anderson/str
  /arr
 
  What happens when I do a search for
  http://localhost:8983/solr/search/?q=myname:Fred I get RECORD 2
  returned before RECORD 1.
 
  RECORD 2
  5.282213 = (MATCH) fieldWeight(myname:Fred in 256575), product of:
   1.0 = tf(termFreq(myname:Fred)=1)
   8.451541 = idf(docFreq=7306, maxDocs=12586425)
   0.625 = fieldNorm(field=myname, doc=256575)
 
  RECORD 1
  4.482106 = (MATCH) fieldWeight(myname:Fred in 215), product of:
   1.4142135 = tf(termFreq(myname:Fred)=2)
   8.451541 = idf(docFreq=7306, maxDocs=12586425)
   0.375 = fieldNorm(field=myname, doc=215)
 
  So the difference is fieldNorm obviously but I think that's only part
  of the story. Why is RECORD 2 returned with a higher score than RECORD
  1 even though RECORD 1 matches Fred exactly? And how should I do
  this differently so that I am getting the results I am expecting?
 
  Thanks,
 
  Brian Lamb

Re: Rounding errors in solr

2011-07-26 Thread Brian Lamb

Is this possible to do? If so, how?

On 7/25/11, Brian Lamb brian.l...@journalexperts.com wrote:
 Yes and that's causing some problems in my application. Is there a way to
 truncate the 7th decimal place in regards to sorting by the score?

 On Fri, Jul 22, 2011 at 4:27 PM, Yonik Seeley
 yo...@lucidimagination.comwrote:

 On Fri, Jul 22, 2011 at 4:11 PM, Brian Lamb
 brian.l...@journalexperts.com wrote:
  I've noticed some peculiar scoring issues going on in my application.
  For
  example, I have a field that is multivalued and has several records
  that
  have the same value. For example,
 
  arr name=references
   strNational Society of Animal Lovers/str
   strNat. Soc. of Ani. Lov./str
  /arr
 
  I have about 300 records with that exact value.
 
  Now, when I do a search for references:(national society animal
  lovers),
 I
  get the following results:
 
  id252/id
  id159/id
  id82/id
  id452/id
  id105/id
 
  When I do a search for references:(nat soc ani lov), I get the results
  ordered differently:
 
  id510/id
  id122/id
  id501/id
  id82/id
  id252/id
 
  When I load all the records that match, I notice that at some point,
  the
  scores aren't the same but differ by only a little:
 
  1.471928 in one and the one before it was 1.471929

 32 bit floats only have 7 decimal digits of precision, and in floating
 point land (a+b+c) can be slightly different than (c+b+a)

 -Yonik
 http://www.lucidimagination.com

Exact match not the first result returned

2011-07-26 Thread Brian Lamb

Hi all,

I am a little confused as to why the scoring is working the way it is:

I have a field defined as:

field name=myname type=text indexed=true stored=true
required=false multivalued=true /

And I have several documents where that value is:

RECORD 1
arr name=myname
  strFred/str
  strFred (the coolest guy in town)/str
/arr

OR

RECORD 2
arr name=myname
  strFred Anderson/str
/arr

What happens when I do a search for
http://localhost:8983/solr/search/?q=myname:Fred I get RECORD 2
returned before RECORD 1.

RECORD 2
5.282213 = (MATCH) fieldWeight(myname:Fred in 256575), product of:
  1.0 = tf(termFreq(myname:Fred)=1)
  8.451541 = idf(docFreq=7306, maxDocs=12586425)
  0.625 = fieldNorm(field=myname, doc=256575)

RECORD 1
4.482106 = (MATCH) fieldWeight(myname:Fred in 215), product of:
  1.4142135 = tf(termFreq(myname:Fred)=2)
  8.451541 = idf(docFreq=7306, maxDocs=12586425)
  0.375 = fieldNorm(field=myname, doc=215)

So the difference is fieldNorm obviously but I think that's only part
of the story. Why is RECORD 2 returned with a higher score than RECORD
1 even though RECORD 1 matches Fred exactly? And how should I do
this differently so that I am getting the results I am expecting?

Thanks,

Brian Lamb

Re: Rounding errors in solr

2011-07-25 Thread Brian Lamb

Yes and that's causing some problems in my application. Is there a way to
truncate the 7th decimal place in regards to sorting by the score?

On Fri, Jul 22, 2011 at 4:27 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Fri, Jul 22, 2011 at 4:11 PM, Brian Lamb
 brian.l...@journalexperts.com wrote:
  I've noticed some peculiar scoring issues going on in my application. For
  example, I have a field that is multivalued and has several records that
  have the same value. For example,
 
  arr name=references
   strNational Society of Animal Lovers/str
   strNat. Soc. of Ani. Lov./str
  /arr
 
  I have about 300 records with that exact value.
 
  Now, when I do a search for references:(national society animal lovers),
 I
  get the following results:
 
  id252/id
  id159/id
  id82/id
  id452/id
  id105/id
 
  When I do a search for references:(nat soc ani lov), I get the results
  ordered differently:
 
  id510/id
  id122/id
  id501/id
  id82/id
  id252/id
 
  When I load all the records that match, I notice that at some point, the
  scores aren't the same but differ by only a little:
 
  1.471928 in one and the one before it was 1.471929

 32 bit floats only have 7 decimal digits of precision, and in floating
 point land (a+b+c) can be slightly different than (c+b+a)

 -Yonik
 http://www.lucidimagination.com

Ignore records that are missing a value in a field

2011-07-25 Thread Brian Lamb

Hi all,

I have an optional field called common_names. I would like to keep this
field optional but at the same, occasionally do a search where I do not
include results where there is no value set for this field. Is this possible
to do within solr?

In other words, I would like to do a search where if there is no value set
for common_names, I would not want that record included in the search
result.

Thanks,

Brian Lamb

Rounding errors in solr

2011-07-22 Thread Brian Lamb

Hi all,

I've noticed some peculiar scoring issues going on in my application. For
example, I have a field that is multivalued and has several records that
have the same value. For example,

arr name=references
  strNational Society of Animal Lovers/str
  strNat. Soc. of Ani. Lov./str
/arr

I have about 300 records with that exact value.

Now, when I do a search for references:(national society animal lovers), I
get the following results:

id252/id
id159/id
id82/id
id452/id
id105/id

When I do a search for references:(nat soc ani lov), I get the results
ordered differently:

id510/id
id122/id
id501/id
id82/id
id252/id

When I load all the records that match, I notice that at some point, the
scores aren't the same but differ by only a little:

1.471928 in one and the one before it was 1.471929

I turned on debugQuery=on and the scores for each of those two records are
exactly the same. Therefore, I think there is some kind of rounding error
going on.

Is there a way I can fix this?

Alternatively, can I sort by a rounded version of the score? I tried
sort=round(score,5) but I get the following message:

Can't determine Sort Order: 'round(score,5) ', pos=5

I also tried sort=sum(score,1) just to see if I was using round
incorrectly but I get an error message there too saying score is not a
recognized field.

Please help!

Thanks,

Brian Lamb

Records disappearing

2011-06-28 Thread Brian Lamb

Hi all,

I'm having some weird behavior with my dataimport script. Because of memory
issues, I've taken to doing a delta import as doing a fullimport with
clean=false. My dataimport config file is set up like:

entity name=findDelta query=SELECT id FROM mytable WHERE date_added gt;
'${dataimporter.last_index_time}' OR last_updated gt;
'${dataimporter.last_index_time}' rootEntity=false
  entity name=mytable
 pk=id
 query=SELECT * FROM mytable WHERE id = '${findDelta.id}'
 deletedPkQuery=SELECT id FROM my_delete_table
 deltaImportQuery=SELECT id FROM mytable WHERE id='${
dataimporter.delta.id}'
 deltaQuery=SELECT id FROM mytable WHERE date_added gt;
'${dataimporter.last_index_time}' OR last_updated gt;
'${dataimporter.last_index_time}'
field column=id name=id /
field column=title name=title /
field column=name name=name /
field column=summary name=summary /
  /entity
/entity

I've found that one (possible more that I haven't noticed) keeps
disappearing from the index. I will do a fullimportclean=false and search
and the record will be there. I'll search again a few hours later and its
there. But then all of a sudden, its gone. I don't know what is triggering
that one record's disappearance but it is quite annoying. Any ideas what's
going on?

Thanks,

Brian Lamb

Reject URL requests unless from localhost for dataimport

2011-06-24 Thread Brian Lamb

Hi all,

My solr server is currently set up at www.mysite.com:8983/solr. I would like
to keep this for the time being but I would like to restrict users from
going to www.mysite.com:8983/solr/dataimport. In that case, I would only
want to be able to do localhost:8983/solr/dataimport. Is this possible? If
so, where should I look for a guide?

Thanks,

Brian Lamb

Re: Default query parser operator

2011-06-10 Thread Brian Lamb

It could, it would be a little bit clunky but that's the direction I'm
heading.

On Tue, Jun 7, 2011 at 6:05 PM, lee carroll lee.a.carr...@googlemail.comwrote:

 Hi Brian could your front end app do this field query logic?

 (assuming you have an app in front of solr)



 On 7 June 2011 18:53, Jonathan Rochkind rochk...@jhu.edu wrote:
  There's no feature in Solr to do what you ask, no. I don't think.
 
  On 6/7/2011 1:30 PM, Brian Lamb wrote:
 
  Hi Jonathan,
 
  Thank you for your reply. Your point about my example is a good one. So
  let
  me try to restate using your example. Suppose I want to apply AND to any
  search terms within field1.
 
  Then
 
  field1:foo field2:bar field1:baz field2:bom
 
  would by written as
 
  http://localhost:8983/solr/?q=field1:foo OR field2:bar OR field1:baz OR
  field2:bom
 
  But if they were written together like:
 
  http://localhost:8983/solr/?q=field1:(foo baz) field2:(bar bom)
 
  I would want it to be
 
  http://localhost:8983/solr/?q=field1:(foo AND baz) OR field2:(bar OR
 bom)
 
  But it sounds like you are saying that would not be possible.
 
  Thanks,
 
  Brian Lamb
 
  On Tue, Jun 7, 2011 at 11:27 AM, Jonathan Rochkindrochk...@jhu.edu
   wrote:
 
  Nope, not possible.
 
  I'm not even sure what it would mean semantically. If you had default
  operator OR ordinarily, but default operator AND just for field2,
  then
  what would happen if you entered:
 
  field1:foo field2:bar field1:baz field2:bom
 
  Where the heck would the ANDs and ORs go?  The operators are BETWEEN
 the
  clauses that specify fields, they don't belong to a field. In general,
  the
  operators are part of the query as a whole, not any specific field.
 
  In fact, I'd be careful of your example query:
 q=field1:foo bar field2:baz
 
  I don't think that means what you think it means, I don't think the
  field1 applies to the bar in that case. Although I could be wrong,
  but
  you definitely want to check it.  You need field1:foo field1:bar, or
  set
  the default field for the query to field1, or use parens (although
 that
  will change the execution strategy and ranking): q=field1:(foo bar)
  
 
  At any rate, even if there's a way to specify this so it makes sense,
 no,
  Solr/lucene doesn't support any such thing.
 
 
 
 
  On 6/7/2011 10:56 AM, Brian Lamb wrote:
 
  I feel like this should be fairly easy to do but I just don't see
  anywhere
  in the documentation on how to do this. Perhaps I am using the wrong
  search
  parameters.
 
  On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb
  brian.l...@journalexperts.comwrote:
 
   Hi all,
 
  Is it possible to change the query parser operator for a specific
 field
  without having to explicitly type it in the search field?
 
  For example, I'd like to use:
 
  http://localhost:8983/solr/search/?q=field1:word token field2:parser
  syntax
 
  instead of
 
  http://localhost:8983/solr/search/?q=field1:word AND token
  field2:parser
  syntax
 
  But, I only want it to be applied to field1, not field2 and I want
 the
  operator to always be AND unless the user explicitly types in OR.
 
  Thanks,
 
  Brian Lamb

Re: Default query parser operator

2011-06-07 Thread Brian Lamb

I feel like this should be fairly easy to do but I just don't see anywhere
in the documentation on how to do this. Perhaps I am using the wrong search
parameters.

On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Hi all,

 Is it possible to change the query parser operator for a specific field
 without having to explicitly type it in the search field?

 For example, I'd like to use:

 http://localhost:8983/solr/search/?q=field1:word token field2:parser
 syntax

 instead of

 http://localhost:8983/solr/search/?q=field1:word AND token field2:parser
 syntax

 But, I only want it to be applied to field1, not field2 and I want the
 operator to always be AND unless the user explicitly types in OR.

 Thanks,

 Brian Lamb

Re: Default query parser operator

2011-06-07 Thread Brian Lamb

Hi Jonathan,

Thank you for your reply. Your point about my example is a good one. So let
me try to restate using your example. Suppose I want to apply AND to any
search terms within field1.

Then

field1:foo field2:bar field1:baz field2:bom

would by written as

http://localhost:8983/solr/?q=field1:foo OR field2:bar OR field1:baz OR
field2:bom

But if they were written together like:

http://localhost:8983/solr/?q=field1:(foo baz) field2:(bar bom)

I would want it to be

http://localhost:8983/solr/?q=field1:(foo AND baz) OR field2:(bar OR bom)

But it sounds like you are saying that would not be possible.

Thanks,

Brian Lamb

On Tue, Jun 7, 2011 at 11:27 AM, Jonathan Rochkind rochk...@jhu.edu wrote:

 Nope, not possible.

 I'm not even sure what it would mean semantically. If you had default
 operator OR ordinarily, but default operator AND just for field2, then
 what would happen if you entered:

 field1:foo field2:bar field1:baz field2:bom

 Where the heck would the ANDs and ORs go?  The operators are BETWEEN the
 clauses that specify fields, they don't belong to a field. In general, the
 operators are part of the query as a whole, not any specific field.

 In fact, I'd be careful of your example query:
q=field1:foo bar field2:baz

 I don't think that means what you think it means, I don't think the
 field1 applies to the bar in that case. Although I could be wrong, but
 you definitely want to check it.  You need field1:foo field1:bar, or set
 the default field for the query to field1, or use parens (although that
 will change the execution strategy and ranking): q=field1:(foo bar)   

 At any rate, even if there's a way to specify this so it makes sense, no,
 Solr/lucene doesn't support any such thing.




 On 6/7/2011 10:56 AM, Brian Lamb wrote:

 I feel like this should be fairly easy to do but I just don't see anywhere
 in the documentation on how to do this. Perhaps I am using the wrong
 search
 parameters.

 On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb
 brian.l...@journalexperts.comwrote:

  Hi all,

 Is it possible to change the query parser operator for a specific field
 without having to explicitly type it in the search field?

 For example, I'd like to use:

 http://localhost:8983/solr/search/?q=field1:word token field2:parser
 syntax

 instead of

 http://localhost:8983/solr/search/?q=field1:word AND token field2:parser
 syntax

 But, I only want it to be applied to field1, not field2 and I want the
 operator to always be AND unless the user explicitly types in OR.

 Thanks,

 Brian Lamb

Default query parser operator

2011-06-06 Thread Brian Lamb

Hi all,

Is it possible to change the query parser operator for a specific field
without having to explicitly type it in the search field?

For example, I'd like to use:

http://localhost:8983/solr/search/?q=field1:word token field2:parser syntax


instead of

http://localhost:8983/solr/search/?q=field1:word AND token field2:parser
syntax

But, I only want it to be applied to field1, not field2 and I want the
operator to always be AND unless the user explicitly types in OR.

Thanks,

Brian Lamb

Re: Searching using a PDF

2011-06-02 Thread Brian Lamb

I mean instead of typing http://localhost:8983/?q=mysearch, I would send a
PDF file with the contents of mysearch and search based on that. I am
leaning toward handling this before it hits solr however.

Thanks,

Brian Lamb

On Wed, Jun 1, 2011 at 3:52 PM, Erick Erickson erickerick...@gmail.comwrote:

 I'm not quite sure what you mean by regular search. When
 you index a PDF (Presumably through Tika or Solr Cell) the text
 is indexed into your index and you can certainly search that. Additionally,
 there may be meta data indexed in specific fields (e.g. author,
 date modified, etc).

 But what does search based on a PDF file mean in your context?

 Best
 Erick

 On Wed, Jun 1, 2011 at 3:41 PM, Brian Lamb
 brian.l...@journalexperts.com wrote:
  Is it possible to do a search based on a PDF file? I know its possible to
  update the index with a PDF but can you do just a regular search with it?
 
  Thanks,
 
  Brian Lamb

Re: Edgengram

2011-06-01 Thread Brian Lamb

Hi Tomás,

Thank you very much for your suggestion. I took another crack at it using
your recommendation and it worked ideally. The only thing I had to change
was

analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory /
/analyzer

analyzer type=query
tokenizer class=solr.LowerCaseTokenizerFactory /
/analyzer

The first did not produce any results but the second worked beautifully.

Thanks!

Brian Lamb

2011/5/31 Tomás Fernández Löbbe tomasflo...@gmail.com

...or also use the LowerCaseTokenizerFactory at query time for consistency,
but not the edge ngram filter.

2011/5/31 Tomás Fernández Löbbe tomasflo...@gmail.com

Hi Brian, I don't know if I understand what you are trying to achieve.
You
want the term query abcdefg to have an idf of 1 insead of 7? I think
using
the KeywordTokenizerFilterFactory at query time should work. I would be
something like:

fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
analyzer type=index

tokenizer class=solr.LowerCaseTokenizerFactory /
filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=25 side=front /
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory /
/analyzer
/fieldType

this way, at query time abcdefg won't be turned to a ab abc abcd abcde
abcdef abcdefg. At index time it will.

Regards,
Tomás

On Tue, May 31, 2011 at 1:07 PM, Brian Lamb
brian.l...@journalexperts.com
wrote:

I believe I used that link when I initially set up the field and it
worked
great (and I'm still using it in other places). In this particular
example
however it does not appear to be practical for me. I mentioned that I
have
a
similarity class that returns 1 for the idf and in the case of an
edgengram,
it returns 1 * length of the search string.

Thanks,

Brian Lamb

On Tue, May 31, 2011 at 11:34 AM, bmdakshinamur...@gmail.com
bmdakshinamur...@gmail.com wrote:

Can you specify the analyzer you are using for your queries?

May be you could use a KeywordAnalyzer for your queries so you don't
end
up
matching parts of your query.

http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
This should help you.

On Tue, May 31, 2011 at 8:24 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

In this particular case, I will be doing a solr search based on user
preferences. So I will not be depending on the user to type
abcdefg.
That
will be automatically generated based on user selections.

The contents of the field do not contain spaces and since I am
created
the
search parameters, case isn't important either.

Thanks,

Brian Lamb

On Tue, May 31, 2011 at 9:44 AM, Erick Erickson
erickerick...@gmail.com
wrote:

That'll work for your case, although be aware that string types
aren't
analyzed at all,
so case matters, as do spaces etc.

What is the use-case here? If you explain it a bit there might be
better answers

Best
Erick

On Fri, May 27, 2011 at 9:17 AM, Brian Lamb
brian.l...@journalexperts.com wrote:
For this, I ended up just changing it to string and using
abcdefg*
to
match. That seems to work so far.

Thanks,

Brian Lamb

On Wed, May 25, 2011 at 4:53 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

Hi all,

I'm running into some confusion with the way edgengram works. I
have
the
field set up as:

I've also set up my own similarity class that returns 1 as the
idf
score.
What I've found this does is if I match a string abcdefg
against a
field
containing abcdefghijklmnop, then the idf will score that as
a
7:

7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2
abcdefg=2)

I get why that's happening, but is there a way to avoid that?
Do
I
need
to
do a new field type to achieve the desired affect?

Thanks,

Brian Lamb

--
Thanks and Regards,
DakshinaMurthy BM

Re: Edgengram

2011-06-01 Thread Brian Lamb

I think in my case LowerCaseTokenizerFactory will be sufficient because
there will never be spaces in this particular field. But thank you for the
useful link!

Thanks,

Brian Lamb

On Wed, Jun 1, 2011 at 11:44 AM, Erick Erickson erickerick...@gmail.comwrote:

Be a little careful here. LowerCaseTokenizerFactory is different than
KeywordTokenizerFactory.

LowerCaseTokenizerFactory will give you more than one term. e.g.
the string Intelligence can't be MeaSurEd will give you 5 terms,
any of which may match. i.e.
intelligence, can, t, be, measured.
whereas KeywordTokenizerFactory followed, by, say LowerCaseFilter
would give you exactly one token:
intelligence can't be measured.

So searching for measured would get a hit in the first case but
not in the second. Searching for intellig* would hit both.

Neither is better, just make sure they do what you want!

This page will help a lot:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory
as will the admin/analysis page.

Best
Erick

On Wed, Jun 1, 2011 at 10:43 AM, Brian Lamb
brian.l...@journalexperts.com wrote:
Hi Tomás,

Thank you very much for your suggestion. I took another crack at it using
your recommendation and it worked ideally. The only thing I had to change
was

analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory /
/analyzer

analyzer type=query
tokenizer class=solr.LowerCaseTokenizerFactory /
/analyzer

The first did not produce any results but the second worked beautifully.

Thanks!

Brian Lamb

2011/5/31 Tomás Fernández Löbbe tomasflo...@gmail.com

...or also use the LowerCaseTokenizerFactory at query time for
consistency,
but not the edge ngram filter.

2011/5/31 Tomás Fernández Löbbe tomasflo...@gmail.com

fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
analyzer type=index

this way, at query time abcdefg won't be turned to a ab abc abcd
abcde
abcdef abcdefg. At index time it will.

Regards,
Tomás

On Tue, May 31, 2011 at 1:07 PM, Brian Lamb
brian.l...@journalexperts.com
wrote:

Thanks,

Brian Lamb

On Tue, May 31, 2011 at 11:34 AM, bmdakshinamur...@gmail.com
bmdakshinamur...@gmail.com wrote:

Can you specify the analyzer you are using for your queries?

May be you could use a KeywordAnalyzer for your queries so you
don't
end
up
matching parts of your query.

http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
This should help you.

On Tue, May 31, 2011 at 8:24 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

In this particular case, I will be doing a solr search based on
user
preferences. So I will not be depending on the user to type
abcdefg.
That
will be automatically generated based on user selections.

The contents of the field do not contain spaces and since I am
created
the
search parameters, case isn't important either.

Thanks,

Brian Lamb

On Tue, May 31, 2011 at 9:44 AM, Erick Erickson
erickerick...@gmail.com
wrote:

That'll work for your case, although be aware that string types
aren't
analyzed at all,
so case matters, as do spaces etc.

What is the use-case here? If you explain it a bit there might
be
better answers

Best
Erick

On Fri, May 27, 2011 at 9:17 AM, Brian Lamb
brian.l...@journalexperts.com wrote:
For this, I ended up just changing it to string and using
abcdefg*
to
match. That seems to work so far.

Thanks,

Brian Lamb

On Wed, May

Searching using a PDF

2011-06-01 Thread Brian Lamb

Is it possible to do a search based on a PDF file? I know its possible to
update the index with a PDF but can you do just a regular search with it?

Thanks,

Brian Lamb

Re: Edgengram

2011-05-31 Thread Brian Lamb

In this particular case, I will be doing a solr search based on user
preferences. So I will not be depending on the user to type abcdefg. That
will be automatically generated based on user selections.

The contents of the field do not contain spaces and since I am created the
search parameters, case isn't important either.

Thanks,

Brian Lamb

On Tue, May 31, 2011 at 9:44 AM, Erick Erickson erickerick...@gmail.comwrote:

 That'll work for your case, although be aware that string types aren't
 analyzed at all,
 so case matters, as do spaces etc.

 What is the use-case here? If you explain it a bit there might be
 better answers

 Best
 Erick

 On Fri, May 27, 2011 at 9:17 AM, Brian Lamb
 brian.l...@journalexperts.com wrote:
  For this, I ended up just changing it to string and using abcdefg* to
  match. That seems to work so far.
 
  Thanks,
 
  Brian Lamb
 
  On Wed, May 25, 2011 at 4:53 PM, Brian Lamb
  brian.l...@journalexperts.comwrote:
 
  Hi all,
 
  I'm running into some confusion with the way edgengram works. I have the
  field set up as:
 
  fieldType name=edgengram class=solr.TextField
  positionIncrementGap=1000
 analyzer
   tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
  maxGramSize=100 side=front /
 /analyzer
  /fieldType
 
  I've also set up my own similarity class that returns 1 as the idf
 score.
  What I've found this does is if I match a string abcdefg against a
 field
  containing abcdefghijklmnop, then the idf will score that as a 7:
 
  7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2)
 
  I get why that's happening, but is there a way to avoid that? Do I need
 to
  do a new field type to achieve the desired affect?
 
  Thanks,
 
  Brian Lamb

Re: Edgengram

2011-05-31 Thread Brian Lamb

fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
   analyzer
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=25 side=front /
   /analyzer
/fieldType

I believe I used that link when I initially set up the field and it worked
great (and I'm still using it in other places). In this particular example
however it does not appear to be practical for me. I mentioned that I have a
similarity class that returns 1 for the idf and in the case of an edgengram,
it returns 1 * length of the search string.

Thanks,

Brian Lamb

On Tue, May 31, 2011 at 11:34 AM, bmdakshinamur...@gmail.com 
bmdakshinamur...@gmail.com wrote:

 Can you specify the analyzer you are using for your queries?

 May be you could use a KeywordAnalyzer for your queries so you don't end up
 matching parts of your query.

 http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
 This should help you.

 On Tue, May 31, 2011 at 8:24 PM, Brian Lamb
 brian.l...@journalexperts.comwrote:

  In this particular case, I will be doing a solr search based on user
  preferences. So I will not be depending on the user to type abcdefg.
 That
  will be automatically generated based on user selections.
 
  The contents of the field do not contain spaces and since I am created
 the
  search parameters, case isn't important either.
 
  Thanks,
 
  Brian Lamb
 
  On Tue, May 31, 2011 at 9:44 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   That'll work for your case, although be aware that string types aren't
   analyzed at all,
   so case matters, as do spaces etc.
  
   What is the use-case here? If you explain it a bit there might be
   better answers
  
   Best
   Erick
  
   On Fri, May 27, 2011 at 9:17 AM, Brian Lamb
   brian.l...@journalexperts.com wrote:
For this, I ended up just changing it to string and using abcdefg*
 to
match. That seems to work so far.
   
Thanks,
   
Brian Lamb
   
On Wed, May 25, 2011 at 4:53 PM, Brian Lamb
brian.l...@journalexperts.comwrote:
   
Hi all,
   
I'm running into some confusion with the way edgengram works. I have
  the
field set up as:
   
fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
   analyzer
 tokenizer class=solr.LowerCaseTokenizerFactory /
   filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=100 side=front /
   /analyzer
/fieldType
   
I've also set up my own similarity class that returns 1 as the idf
   score.
What I've found this does is if I match a string abcdefg against a
   field
containing abcdefghijklmnop, then the idf will score that as a 7:
   
7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2
 abcdefg=2)
   
I get why that's happening, but is there a way to avoid that? Do I
  need
   to
do a new field type to achieve the desired affect?
   
Thanks,
   
Brian Lamb
   
   
  
 



 --
 Thanks and Regards,
 DakshinaMurthy BM

Explain the difference in similarity and similarityProvider

2011-05-30 Thread Brian Lamb

I'm looking over the patch notes from
https://issues.apache.org/jira/browse/SOLR-2338 and I do not understand the
difference between

similarity class=com.example.solr.CustomSimilarityFactory
  str name=paramkeyparam value/str
/similarity

and

similarityProvider
class=org.apache.solr.schema.CustomSimilarityProviderFactory
  str name=echois there an echo?/str
/similarityProvider

When would I use one over the other?

Thanks,

Brian Lamb

Re: Edgengram

2011-05-27 Thread Brian Lamb

For this, I ended up just changing it to string and using abcdefg* to
match. That seems to work so far.

Thanks,

Brian Lamb

On Wed, May 25, 2011 at 4:53 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Hi all,

 I'm running into some confusion with the way edgengram works. I have the
 field set up as:

 fieldType name=edgengram class=solr.TextField
 positionIncrementGap=1000
analyzer
  tokenizer class=solr.LowerCaseTokenizerFactory /
filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=100 side=front /
/analyzer
 /fieldType

 I've also set up my own similarity class that returns 1 as the idf score.
 What I've found this does is if I match a string abcdefg against a field
 containing abcdefghijklmnop, then the idf will score that as a 7:

 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2)

 I get why that's happening, but is there a way to avoid that? Do I need to
 do a new field type to achieve the desired affect?

 Thanks,

 Brian Lamb

Re: Similarity per field

2011-05-27 Thread Brian Lamb

I'm still not having any luck with this. Has anyone actually gotten this to
work so far? I feel like I've followed the directions to the letter but it
just doesn't work.

Thanks,

Brian Lamb

On Wed, May 25, 2011 at 2:48 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 I looked at the patch page and saw the files that were changed. I went into
 my install and looked at those same files and found that they had indeed
 been changed. So it looks like I have the correct version of solr.


 On Wed, May 25, 2011 at 1:01 PM, Brian Lamb brian.l...@journalexperts.com
  wrote:

 Hi all,

 I sent a mail in about this topic a week ago but now that I have more
 information about what I am doing, as well as a better understanding of how
 the similarity class works, I wanted to start a new thread with a bit more
 information about what I'm doing, what I want to do, and how I can make it
 work correctly.

 I have written a similarity class that I would like applied to a specific
 field.

 This is how I am defining the fieldType:

 fieldType name=edgengram_cust class=solr.TextField
 positionIncrementGap=1000
analyzer
  tokenizer class=solr.LowerCaseTokenizerFactory /
  filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=1 side=front /
/analyzer
similarity class=my.package.similarity.MySimilarity/
 /fieldType

 And then I assign a specific field to that fieldType:

 field name=myfield multiValued=true type=edgengram_cust
 indexed=true stored=true required=false omitNorms=true /

 Then, I restarted solr and did a fullimport. However, the changes I have
 made do not appear to be taking hold. For simplicity, right now I just have
 the idf function returning 1. When I do a search with debugQuery=on, the idf
 behaves as it normally does. However, when I search on this field, the idf
 should be 1 and that is not the case.

 To try and nail down where the problem occurs, I commented out the
 similarity class definition in the fieldType and added it globally to the
 schema file:

 similarity class=my.package.similarity.MySimilarity/

 Then, I restarted solr and did a fullimport. This time, the idf scores
 were all 1. So it seems to me the problem is not with my similarity class
 but in trying to apply it to a specific fieldType.

 According to https://issues.apache.org/jira/browse/SOLR-2338, this should
 be in the trunk now yes? I have run svn up on both my lucene and solr
 installs and it still is not recognizing it on a per field basis.

 Is the tag different inside a fieldType? Did I not update solr correctly?
 Where is my mistake?

 Thanks,

 Brian Lamb

Similarity per field

2011-05-25 Thread Brian Lamb

Hi all,

I sent a mail in about this topic a week ago but now that I have more
information about what I am doing, as well as a better understanding of how
the similarity class works, I wanted to start a new thread with a bit more
information about what I'm doing, what I want to do, and how I can make it
work correctly.

I have written a similarity class that I would like applied to a specific
field.

This is how I am defining the fieldType:

fieldType name=edgengram_cust class=solr.TextField
positionIncrementGap=1000
   analyzer
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=1 side=front /
   /analyzer
   similarity class=my.package.similarity.MySimilarity/
/fieldType

And then I assign a specific field to that fieldType:

field name=myfield multiValued=true type=edgengram_cust
indexed=true stored=true required=false omitNorms=true /

Then, I restarted solr and did a fullimport. However, the changes I have
made do not appear to be taking hold. For simplicity, right now I just have
the idf function returning 1. When I do a search with debugQuery=on, the idf
behaves as it normally does. However, when I search on this field, the idf
should be 1 and that is not the case.

To try and nail down where the problem occurs, I commented out the
similarity class definition in the fieldType and added it globally to the
schema file:

similarity class=my.package.similarity.MySimilarity/

Then, I restarted solr and did a fullimport. This time, the idf scores were
all 1. So it seems to me the problem is not with my similarity class but in
trying to apply it to a specific fieldType.

According to https://issues.apache.org/jira/browse/SOLR-2338, this should be
in the trunk now yes? I have run svn up on both my lucene and solr installs
and it still is not recognizing it on a per field basis.

Is the tag different inside a fieldType? Did I not update solr correctly?
Where is my mistake?

Thanks,

Brian Lamb

Re: Similarity per field

2011-05-25 Thread Brian Lamb

I looked at the patch page and saw the files that were changed. I went into
my install and looked at those same files and found that they had indeed
been changed. So it looks like I have the correct version of solr.

On Wed, May 25, 2011 at 1:01 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Hi all,

 I sent a mail in about this topic a week ago but now that I have more
 information about what I am doing, as well as a better understanding of how
 the similarity class works, I wanted to start a new thread with a bit more
 information about what I'm doing, what I want to do, and how I can make it
 work correctly.

 I have written a similarity class that I would like applied to a specific
 field.

 This is how I am defining the fieldType:

 fieldType name=edgengram_cust class=solr.TextField
 positionIncrementGap=1000
analyzer
  tokenizer class=solr.LowerCaseTokenizerFactory /
  filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=1 side=front /
/analyzer
similarity class=my.package.similarity.MySimilarity/
 /fieldType

 And then I assign a specific field to that fieldType:

 field name=myfield multiValued=true type=edgengram_cust
 indexed=true stored=true required=false omitNorms=true /

 Then, I restarted solr and did a fullimport. However, the changes I have
 made do not appear to be taking hold. For simplicity, right now I just have
 the idf function returning 1. When I do a search with debugQuery=on, the idf
 behaves as it normally does. However, when I search on this field, the idf
 should be 1 and that is not the case.

 To try and nail down where the problem occurs, I commented out the
 similarity class definition in the fieldType and added it globally to the
 schema file:

 similarity class=my.package.similarity.MySimilarity/

 Then, I restarted solr and did a fullimport. This time, the idf scores were
 all 1. So it seems to me the problem is not with my similarity class but in
 trying to apply it to a specific fieldType.

 According to https://issues.apache.org/jira/browse/SOLR-2338, this should
 be in the trunk now yes? I have run svn up on both my lucene and solr
 installs and it still is not recognizing it on a per field basis.

 Is the tag different inside a fieldType? Did I not update solr correctly?
 Where is my mistake?

 Thanks,

 Brian Lamb

Edgengram

2011-05-25 Thread Brian Lamb

Hi all,

I'm running into some confusion with the way edgengram works. I have the
field set up as:

fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
   analyzer
 tokenizer class=solr.LowerCaseTokenizerFactory /
   filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=100 side=front /
   /analyzer
/fieldType

I've also set up my own similarity class that returns 1 as the idf score.
What I've found this does is if I match a string abcdefg against a field
containing abcdefghijklmnop, then the idf will score that as a 7:

7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2)

I get why that's happening, but is there a way to avoid that? Do I need to
do a new field type to achieve the desired affect?

Thanks,

Brian Lamb

Re: Similarity

2011-05-24 Thread Brian Lamb

This did the trick. Thanks!

On Mon, May 23, 2011 at 5:03 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Hmm. I don't add code to Apache packages but create my own packages and
 namespaces, build a jar and add it to the lib directory as specified in
 solrconfig. Then you can use the FQCN to in the similarity config to point
 to
 the class.

 May be it can work when messing inside the apache namespace but then you
 have
 to build Lucene as well.


  Okay well this is encouraging. I changed SweetSpotSimilarity to
  MyClassSimilarity. I created this class in:
 
  lucene/contrib/misc/src/java/org/apache/lucene/misc/
 
  I am getting a ClassNotFoundException when I try to start solr.
 
  Here is the contents of the MyClassSimilarity file:
 
  package org.apache.lucene.misc;
  import org.apache.lucene.search.DefaultSimilarity;
 
  public class MyClassSimilarity extends DefaultSimilarity {
public MyClassSimilarity() { super(); }
public float idf(int a1, int a2) { return 1; }
  }
 
  So then this raises two questions. Why am I getting a
  classNotFoundException and how can I go about fixing it?
 
  Thanks,
 
  Brian Lamb
 
  On Mon, May 23, 2011 at 3:41 PM, Markus Jelsma
 
  markus.jel...@openindex.iowrote:
   As far as i know, SweetSpotSimilarty needs be configured. I did use it
   once but
   wrapped a factory around it to configure the sweet spot. It worked just
   as expected and explained in that paper about the subject.
  
   If you use a custom similarity that , for example, caps tf to 1. Does
 it
   then
   work?
  
Hi all,
   
I'm having trouble getting the basic similarity example to work. If
 you
notice at the bottom of the schema.xml file, there is a line there
 that
  
   is
  
commented out:
   
!-- similarity class=org.apache.lucene.search.DefaultSimilarity/
--
   
I uncomment that line and replace it with the following:
   
similarity class=org.apache.lucene.misc.SweetSpotSimilarity/
   
Which comes natively with lucene. However, the scores before and
 after
making this change are the same. I did a full import both times but
that didn't seem to help.
   
I ran svn up on both my solr directory and my lucene directory.
Actually, my lucene directory was not previously under svn so I
removed everything in there and did svn co
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/
   
So why isn't my installation taking the SweetSpot Similarity change?
   
Thanks,
   
Brian Lamb

Similarity

2011-05-23 Thread Brian Lamb

Hi all,

I'm having trouble getting the basic similarity example to work. If you
notice at the bottom of the schema.xml file, there is a line there that is
commented out:

!-- similarity class=org.apache.lucene.search.DefaultSimilarity/ --

I uncomment that line and replace it with the following:

similarity class=org.apache.lucene.misc.SweetSpotSimilarity/

Which comes natively with lucene. However, the scores before and after
making this change are the same. I did a full import both times but that
didn't seem to help.

I ran svn up on both my solr directory and my lucene directory. Actually, my
lucene directory was not previously under svn so I removed everything in
there and did svn co
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/

So why isn't my installation taking the SweetSpot Similarity change?

Thanks,

Brian Lamb

Re: Similarity

2011-05-23 Thread Brian Lamb

Okay well this is encouraging. I changed SweetSpotSimilarity to
MyClassSimilarity. I created this class in:

lucene/contrib/misc/src/java/org/apache/lucene/misc/

I am getting a ClassNotFoundException when I try to start solr.

Here is the contents of the MyClassSimilarity file:

package org.apache.lucene.misc;
import org.apache.lucene.search.DefaultSimilarity;

public class MyClassSimilarity extends DefaultSimilarity {
  public MyClassSimilarity() { super(); }
  public float idf(int a1, int a2) { return 1; }
}

So then this raises two questions. Why am I getting a classNotFoundException
and how can I go about fixing it?

Thanks,

Brian Lamb

On Mon, May 23, 2011 at 3:41 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 As far as i know, SweetSpotSimilarty needs be configured. I did use it once
 but
 wrapped a factory around it to configure the sweet spot. It worked just as
 expected and explained in that paper about the subject.

 If you use a custom similarity that , for example, caps tf to 1. Does it
 then
 work?



  Hi all,
 
  I'm having trouble getting the basic similarity example to work. If you
  notice at the bottom of the schema.xml file, there is a line there that
 is
  commented out:
 
  !-- similarity class=org.apache.lucene.search.DefaultSimilarity/ --
 
  I uncomment that line and replace it with the following:
 
  similarity class=org.apache.lucene.misc.SweetSpotSimilarity/
 
  Which comes natively with lucene. However, the scores before and after
  making this change are the same. I did a full import both times but that
  didn't seem to help.
 
  I ran svn up on both my solr directory and my lucene directory. Actually,
  my lucene directory was not previously under svn so I removed everything
  in there and did svn co
  http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/
 
  So why isn't my installation taking the SweetSpot Similarity change?
 
  Thanks,
 
  Brian Lamb

Re: Similarity class for an individual field

2011-05-20 Thread Brian Lamb

Yes. Was that not what I was supposed to do?

On Thu, May 19, 2011 at 8:26 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 (11/05/20 3:45), Brian Lamb wrote:

 Hi all,

 Based on advice I received on a previous email thread, I applied patch
 https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able
 to
 apply a similarity class to certain fields but not all fields.

 I ran the following commands:

 $ cdyour Solr trunk checkout dir
 $ svn up
 $ wget
 https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch
 $ patch -p0 -i SOLR-2338.patch

 And I did not get any errors. I then created my own SimilarityClass


 Brian,

 I'm confused what you did because SOLR-2338 has been resolved in March and
 committed
 in trunk, but you did svn up  apply patch in your trunk?

 Koji
 --
 http://www.rondhuit.com/en/

Re: Similarity class for an individual field

2011-05-20 Thread Brian Lamb

So what was my mistake? I still have not resolved this issue.

On Fri, May 20, 2011 at 11:22 AM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Yes. Was that not what I was supposed to do?


 On Thu, May 19, 2011 at 8:26 PM, Koji Sekiguchi k...@r.email.ne.jpwrote:

 (11/05/20 3:45), Brian Lamb wrote:

 Hi all,

 Based on advice I received on a previous email thread, I applied patch
 https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able
 to
 apply a similarity class to certain fields but not all fields.

 I ran the following commands:

 $ cdyour Solr trunk checkout dir
 $ svn up
 $ wget
 https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch
 $ patch -p0 -i SOLR-2338.patch

 And I did not get any errors. I then created my own SimilarityClass


 Brian,

 I'm confused what you did because SOLR-2338 has been resolved in March and
 committed
 in trunk, but you did svn up  apply patch in your trunk?

 Koji
 --
 http://www.rondhuit.com/en/

Similarity class for an individual field

2011-05-19 Thread Brian Lamb

Hi all,

Based on advice I received on a previous email thread, I applied patch
https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able to
apply a similarity class to certain fields but not all fields.

I ran the following commands:

$ cd your Solr trunk checkout dir
$ svn up
$ wget https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch
$ patch -p0 -i SOLR-2338.patch

And I did not get any errors. I then created my own SimilarityClass
listed below because it isn't very large:

package org.apache.lucene.misc;
import org.apache.lucene.search.DefaultSimilarity;

public class SimpleSimilarity extends DefaultSimilarity {
  public SimpleSimilarity() { super(); }
  public float idf(int dont, int care) { return 1; }
}

As you can see, it isn't very complicated. I'm just trying to remove
the idf from the scoring equation in certain cases.

Next, I make a change to the schema.xml file:

fieldType name=string_noidf class=solr.StrField
sortMissingLast=true omitNorms=true
  similarity class=org.apache.lucene.misc.SimpleSimilarity/
/fieldType

And apply that to the field in question:

field name=string_noidf multiValued=true type=string_noidf
indexed=true stored=true required=false omitNorms=true /

But I think something did not get applied correctly to the patch. I
restarted and did a full import but the scores are exactly the same.
Also, I tried using the existing SweetSpotSimilarity:
fieldType name=string_noidf class=solr.StrField
sortMissingLast=true omitNorms=true
  similarity class=org.apache.lucene.misc.SweetSpotSimilarity/
/fieldType

But the scores remained unchanged even in that case. At this point,
I'm not quite sure how to debug this to see whether the problem is
with the patch or the similarity class but given that the SweetSpot
similarity class didn't work either, I'm inclined to think it was a
problem with the patch.

Any thoughts on this one?

Thanks,

Brian Lamb

Re: Similarity class for an individual field

2011-05-19 Thread Brian Lamb

Also, I've tried adding:

similarity class=org.apache.lucene.misc.SweetSpotSimilarity/

To the end of the schema file so that it is applied globally but it does not
appear to change the score either. What am I doing incorrectly?

Thanks,

Brian Lamb

On Thu, May 19, 2011 at 2:45 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Hi all,

 Based on advice I received on a previous email thread, I applied patch
 https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able to
 apply a similarity class to certain fields but not all fields.

 I ran the following commands:

 $ cd your Solr trunk checkout dir
 $ svn up
 $ wget 
 https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch
 $ patch -p0 -i SOLR-2338.patch

 And I did not get any errors. I then created my own SimilarityClass listed 
 below because it isn't very large:

 package org.apache.lucene.misc;
 import org.apache.lucene.search.DefaultSimilarity;

 public class SimpleSimilarity extends DefaultSimilarity {
   public SimpleSimilarity() { super(); }

   public float idf(int dont, int care) { return 1; }
 }

 As you can see, it isn't very complicated. I'm just trying to remove the idf 
 from the scoring equation in certain cases.

 Next, I make a change to the schema.xml file:

 fieldType name=string_noidf class=solr.StrField sortMissingLast=true 
 omitNorms=true

   similarity class=org.apache.lucene.misc.SimpleSimilarity/
 /fieldType

 And apply that to the field in question:

 field name=string_noidf multiValued=true type=string_noidf 
 indexed=true stored=true required=false omitNorms=true /

 But I think something did not get applied correctly to the patch. I restarted 
 and did a full import but the scores are exactly the same. Also, I tried 
 using the existing SweetSpotSimilarity:
 fieldType name=string_noidf class=solr.StrField sortMissingLast=true 
 omitNorms=true
   similarity class=org.apache.lucene.misc.SweetSpotSimilarity/

 /fieldType

 But the scores remained unchanged even in that case. At this point, I'm not 
 quite sure how to debug this to see whether the problem is with the patch or 
 the similarity class but given that the SweetSpot similarity class didn't 
 work either, I'm inclined to think it was a problem with the patch.

 Any thoughts on this one?

 Thanks,

 Brian Lamb

Re: Similarity class for an individual field

2011-05-19 Thread Brian Lamb

I tried editing the SweetSpotSimilarity class located at
lucene/contrib/misc/src/java/org/apache/lucene/misc/SweetSpotSimilarity.java
to just return 1 for each function and the score does not change at all.
This has led me to believe that it does not recognize similarity at all. At
this point, all I have for similarity is the line at the end of the file to
apply similarity to all searches but that does not even work. So where am I
going wrong?

Thanks,

Brian Lamb

On Thu, May 19, 2011 at 3:41 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Also, I've tried adding:

 similarity class=org.apache.lucene.misc.SweetSpotSimilarity/

 To the end of the schema file so that it is applied globally but it does
 not appear to change the score either. What am I doing incorrectly?

 Thanks,

 Brian Lamb

 On Thu, May 19, 2011 at 2:45 PM, Brian Lamb brian.l...@journalexperts.com
  wrote:

 Hi all,

 Based on advice I received on a previous email thread, I applied patch
 https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able
 to apply a similarity class to certain fields but not all fields.

 I ran the following commands:

 $ cd your Solr trunk checkout dir
 $ svn up
 $ wget 
 https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch
 $ patch -p0 -i SOLR-2338.patch

 And I did not get any errors. I then created my own SimilarityClass listed 
 below because it isn't very large:

 package org.apache.lucene.misc;
 import org.apache.lucene.search.DefaultSimilarity;

 public class SimpleSimilarity extends DefaultSimilarity {
   public SimpleSimilarity() { super(); }


   public float idf(int dont, int care) { return 1; }
 }

 As you can see, it isn't very complicated. I'm just trying to remove the idf 
 from the scoring equation in certain cases.


 Next, I make a change to the schema.xml file:

 fieldType name=string_noidf class=solr.StrField sortMissingLast=true 
 omitNorms=true


   similarity class=org.apache.lucene.misc.SimpleSimilarity/
 /fieldType

 And apply that to the field in question:

 field name=string_noidf multiValued=true type=string_noidf 
 indexed=true stored=true required=false omitNorms=true /


 But I think something did not get applied correctly to the patch. I 
 restarted and did a full import but the scores are exactly the same. Also, I 
 tried using the existing SweetSpotSimilarity:
 fieldType name=string_noidf class=solr.StrField sortMissingLast=true 
 omitNorms=true
   similarity class=org.apache.lucene.misc.SweetSpotSimilarity/


 /fieldType

 But the scores remained unchanged even in that case. At this point, I'm not 
 quite sure how to debug this to see whether the problem is with the patch or 
 the similarity class but given that the SweetSpot similarity class didn't 
 work either, I'm inclined to think it was a problem with the patch.


 Any thoughts on this one?

 Thanks,

 Brian Lamb

Re: Disable IDF scoring on certain fields

2011-05-18 Thread Brian Lamb

I believe I have applied the patch correctly. However, I cannot seem to
figure out where the similarity class I create should reside. Any tips on
that?

Thanks,

Brian Lamb

On Tue, May 17, 2011 at 4:00 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Thank you Robert for pointing this out. This is not being used for
 autocomplete. I already have another core set up for that :-)

 The idea is like I outlined above. I just want a multivalued field that
 treats every term in the field the same so that the only way documents
 separate themselves is by an unrelated boost and/or matching on multiple
 terms in that field.


 On Tue, May 17, 2011 at 3:55 PM, Markus Jelsma markus.jel...@openindex.io
  wrote:

 Well, if you're experimental you can try trunk as Robert points out it has
 been fixed there. If not, i guess you're stuck with creating another core.

 If this fieldType specifically used for auto-completion? If so, another
 core,
 preferably on another machine, is in my opinion the way to go.
 Auto-completion
 is tough in terms of performance.

 Thanks Robert for pointing to the Jira ticket.

 Cheers

  Hi Markus,
 
  I was just looking at overriding DefaultSimilarity so your email was
 well
  timed. The problem I have with it is as you mentioned, it does not seem
  possible to do it on a field by field basis. Has anyone had any luck
 with
  doing some of the similarity functions on a field by field basis? I have
  need to do more than one of them and from what I can find, it seems that
  only computeNorm accounts for the name of the field.
 
  Thanks,
 
  Brian Lamb
 
  On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
 
  markus.jel...@openindex.iowrote:
   Hi,
  
   Although you can configure per field TF (by omitTermFreqAndPositions)
 you
   can't
   do this for IDF. If you index is only used for this specific purpose
   (seems like an auto-complete index) then you can override
   DefaultSimilarity and return a static value for IDF. If you still want
   IDF for other fields then i
   think you have a problem because Solr doesn't yet support per-field
   similarity.
  
  
  
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/jav
   a/org/apache/lucene/search/DefaultSimilarity.java?view=markup
  
   Cheers,
  
Hi all,
   
I have a field defined in my schema.xml file as
   
fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
   
   analyzer
   
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
   
maxGramSize=25 side=front /
   
   /analyzer
   
/fieldType
field name=myfield multiValued=true type=edgengram
indexed=true stored=true required=false omitNorms=true /
   
I would like do disable IDF scoring on this field. I am not
 interested
in how rare the term is, I only care if the term is present or not.
The idea is that if a user does a search for myfield:dog OR
myfield:pony, that any document containing dog or pony would be
scored identically. In the case that both showed up, that record
 would
be moved to the top but all the records where they both showed up
would have the same score.
   
So long story short, how can I disable the idf score for this
particular field?
   
Thanks,
   
Brian Lamb

Re: MoreLikeThis PDF search

2011-05-17 Thread Brian Lamb

Would I be better off trying to use something like PHP to read the PDF file
and extrapolate the information and then pass it on to the MoreLikeThis
handler or is there a way it can be done by giving it the PDF directly?

On Fri, May 13, 2011 at 4:54 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Any thoughts on this one?


 On Thu, May 12, 2011 at 10:46 AM, Brian Lamb 
 brian.l...@journalexperts.com wrote:

 Hi all,

 I've become more and more familiar with the MoreLikeThis handler over the
 last several months. I'm curious whether it is possible to do a MoreLikeThis
 search by uploading a PDF? I looked at the ExtractingRequestHandler and that
 looks like it that is used to process PDF files and the like but is it
 possible to combine the two?

 Just to be clear, I don't want to send a PDF and have that be a part of
 the index. But rather, I'd like to be able to use the PDF as a MoreLikeThis
 search.

 Thanks,

 Brian Lamb

Disable IDF scoring on certain fields

2011-05-17 Thread Brian Lamb

Hi all,

I have a field defined in my schema.xml file as

fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
   analyzer
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=25 side=front /
   /analyzer
/fieldType
field name=myfield multiValued=true type=edgengram indexed=true
stored=true required=false omitNorms=true /

I would like do disable IDF scoring on this field. I am not interested in
how rare the term is, I only care if the term is present or not. The idea is
that if a user does a search for myfield:dog OR myfield:pony, that any
document containing dog or pony would be scored identically. In the case
that both showed up, that record would be moved to the top but all the
records where they both showed up would have the same score.

So long story short, how can I disable the idf score for this particular
field?

Thanks,

Brian Lamb

Re: Disable IDF scoring on certain fields

2011-05-17 Thread Brian Lamb

Hi Markus,

I was just looking at overriding DefaultSimilarity so your email was well
timed. The problem I have with it is as you mentioned, it does not seem
possible to do it on a field by field basis. Has anyone had any luck with
doing some of the similarity functions on a field by field basis? I have
need to do more than one of them and from what I can find, it seems that
only computeNorm accounts for the name of the field.

Thanks,

Brian Lamb

On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

Hi,

Although you can configure per field TF (by omitTermFreqAndPositions) you
can't
do this for IDF. If you index is only used for this specific purpose (seems
like an auto-complete index) then you can override DefaultSimilarity and
return a static value for IDF. If you still want IDF for other fields then
i
think you have a problem because Solr doesn't yet support per-field
similarity.

http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/search/DefaultSimilarity.java?view=markup

Cheers,

Hi all,

I have a field defined in my schema.xml file as

fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
analyzer
tokenizer class=solr.LowerCaseTokenizerFactory /
filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=25 side=front /
/analyzer
/fieldType
field name=myfield multiValued=true type=edgengram indexed=true
stored=true required=false omitNorms=true /

I would like do disable IDF scoring on this field. I am not interested in
how rare the term is, I only care if the term is present or not. The idea
is that if a user does a search for myfield:dog OR myfield:pony, that
any document containing dog or pony would be scored identically. In the
case that both showed up, that record would be moved to the top but all
the records where they both showed up would have the same score.

So long story short, how can I disable the idf score for this particular
field?

Thanks,

Brian Lamb

Re: Disable IDF scoring on certain fields

2011-05-17 Thread Brian Lamb

Thank you Robert for pointing this out. This is not being used for
autocomplete. I already have another core set up for that :-)

The idea is like I outlined above. I just want a multivalued field that
treats every term in the field the same so that the only way documents
separate themselves is by an unrelated boost and/or matching on multiple
terms in that field.


On Tue, May 17, 2011 at 3:55 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Well, if you're experimental you can try trunk as Robert points out it has
 been fixed there. If not, i guess you're stuck with creating another core.

 If this fieldType specifically used for auto-completion? If so, another
 core,
 preferably on another machine, is in my opinion the way to go.
 Auto-completion
 is tough in terms of performance.

 Thanks Robert for pointing to the Jira ticket.

 Cheers

  Hi Markus,
 
  I was just looking at overriding DefaultSimilarity so your email was well
  timed. The problem I have with it is as you mentioned, it does not seem
  possible to do it on a field by field basis. Has anyone had any luck with
  doing some of the similarity functions on a field by field basis? I have
  need to do more than one of them and from what I can find, it seems that
  only computeNorm accounts for the name of the field.
 
  Thanks,
 
  Brian Lamb
 
  On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
 
  markus.jel...@openindex.iowrote:
   Hi,
  
   Although you can configure per field TF (by omitTermFreqAndPositions)
 you
   can't
   do this for IDF. If you index is only used for this specific purpose
   (seems like an auto-complete index) then you can override
   DefaultSimilarity and return a static value for IDF. If you still want
   IDF for other fields then i
   think you have a problem because Solr doesn't yet support per-field
   similarity.
  
  
  
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/jav
   a/org/apache/lucene/search/DefaultSimilarity.java?view=markup
  
   Cheers,
  
Hi all,
   
I have a field defined in my schema.xml file as
   
fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
   
   analyzer
   
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
   
maxGramSize=25 side=front /
   
   /analyzer
   
/fieldType
field name=myfield multiValued=true type=edgengram
indexed=true stored=true required=false omitNorms=true /
   
I would like do disable IDF scoring on this field. I am not
 interested
in how rare the term is, I only care if the term is present or not.
The idea is that if a user does a search for myfield:dog OR
myfield:pony, that any document containing dog or pony would be
scored identically. In the case that both showed up, that record
 would
be moved to the top but all the records where they both showed up
would have the same score.
   
So long story short, how can I disable the idf score for this
particular field?
   
Thanks,
   
Brian Lamb

Re: MoreLikeThis PDF search

2011-05-13 Thread Brian Lamb

Any thoughts on this one?

On Thu, May 12, 2011 at 10:46 AM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Hi all,

 I've become more and more familiar with the MoreLikeThis handler over the
 last several months. I'm curious whether it is possible to do a MoreLikeThis
 search by uploading a PDF? I looked at the ExtractingRequestHandler and that
 looks like it that is used to process PDF files and the like but is it
 possible to combine the two?

 Just to be clear, I don't want to send a PDF and have that be a part of the
 index. But rather, I'd like to be able to use the PDF as a MoreLikeThis
 search.

 Thanks,

 Brian Lamb

MoreLikeThis PDF search

2011-05-12 Thread Brian Lamb

Hi all,

I've become more and more familiar with the MoreLikeThis handler over the
last several months. I'm curious whether it is possible to do a MoreLikeThis
search by uploading a PDF? I looked at the ExtractingRequestHandler and that
looks like it that is used to process PDF files and the like but is it
possible to combine the two?

Just to be clear, I don't want to send a PDF and have that be a part of the
index. But rather, I'd like to be able to use the PDF as a MoreLikeThis
search.

Thanks,

Brian Lamb

Changing the schema

2011-05-12 Thread Brian Lamb

If I change the field type in my schema, do I need to rebuild the entire
index? I'm at a point now where it takes over a day to do a full import due
to the sheer size of my application and I would prefer not having to reindex
just because I want to make a change somewhere.

Thanks,

Brian Lamb

Re: Solr security

2011-05-10 Thread Brian Lamb

Great posts all. I will give these a look and come up with something based
on these recommendations. I'm sure as I begin implementing something, I will
have more questions arise.

On Tue, May 10, 2011 at 9:00 AM, Anthony Wlodarski 
anth...@tinkertownlabs.com wrote:

 The WIKI has a loose interpretation of how to set-up Jetty securely.
  Please take a look at the article I wrote here:
 http://anthonyw.net/2011/04/securing-jetty-and-solr-with-php-authentication/.
  Even if PHP is not your language that sits on top of Solr you can still use
 the first part of the tutorial.  If you are using Tomcat I would recommend
 looking here:
 http://blog.comtaste.com/2009/02/securing_your_solr_server_on_t.html

 Regards,

 -Anthony


 On 05/09/2011 05:28 PM, Jan Høydahl wrote:

 Hi,

 You can simply configure a firewall on your Solr server to only allow
 access from your frontend server. Whether you use the built-in software
 firewall of Linux/Windows/Whatever or use some other FW utility is a choice
 you need to make. This is by design - you should never ever expose your
 backend services, whether it's a search server or a database server, to the
 public.

 Read more about Solr security on the WIKI:
 http://wiki.apache.org/solr/SolrSecurity

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 On 9. mai 2011, at 20.57, Brian Lamb wrote:

  Hi all,

 Is it possible to set up solr so that it will only execute dataimport
 commands if they come from localhost?

 Right now, my application and my solr installation are on different
 servers
 so any requests are formatted http://domain:8983 instead of
 http://localhost:8983. I am concerned that when I launch my application,
 there will be the potential for abuse. Is the best solution to have
 everything reside on the same server?

 What are some other solutions?

 Thanks,

 Brian Lamb


 --
 Anthony Wlodarski
 Lead Software Engineer
 Get2Know.me (http://www.get2know.me)
 Office: 646-285-0500 x217
 Fax: 646-285-0400

Solr security

2011-05-09 Thread Brian Lamb

Hi all,

Is it possible to set up solr so that it will only execute dataimport
commands if they come from localhost?

Right now, my application and my solr installation are on different servers
so any requests are formatted http://domain:8983 instead of
http://localhost:8983. I am concerned that when I launch my application,
there will be the potential for abuse. Is the best solution to have
everything reside on the same server?

What are some other solutions?

Thanks,

Brian Lamb

Negative boost

2011-05-02 Thread Brian Lamb

Hi all,

I understand that the only way to simulate a negative boost is to positively
boost the inverse. I have looked at
http://wiki.apache.org/solr/SolrRelevancyFAQ but I think I am missing
something on the formatting of my query. I am using:

http://localhost:8983/solr/search?q=dogbq=(*:* -species:Sheltie)^1

In this case, I am trying to search for records about dog but to put
records containing Sheltie closer to the bottom as I am not really
interested in that. However, the following queries:

http://localhost:8983/solr/search?q=dog
http://localhost:8983/solr/search?q=dogbq=(*:* -species:Sheltie)^1

Return the exact same set of results with a record about a Sheltie as the
top result each time. What am I doing incorrectly?

Thanks,

Brian Lamb

Re: MoreLikeThis

2011-04-25 Thread Brian Lamb

It finds something under match but just nothing under response. I tried
turning on debugQuery=on but I did not see anything that jumped out at me as
a bug or anything. Is there some kind of threshold setting that I can tinker
with to see if that is the problem?

On Sun, Apr 24, 2011 at 2:37 AM, Grant Ingersoll gsing...@apache.orgwrote:


 On Apr 21, 2011, at 8:46 PM, Brian Lamb wrote:

  Hi all,
 
  I have an mlt search set up on my site with over 2 million records in the
  index. Normally, my results look like:
 
  response
   lst name=responseHeader
 int name=status0/int
 int name=QTime204/int
   /lst
   result name=match numFound=41750 start=0
 doc
   str name=titleSome result./str
 /doc
   /result
   result name=response numFound=130872 start=0
 doc
   str name=titleA similar result/str
 /doc
 ...
   /result
  /response
 
  And there are 100 results under response. However, in some cases, there
 are
  no results under response. Why is this the case and is there anything I
  can do about it?

 Is it because it couldn't find anything?  Or are you thinking there is a
 bug?  You might try adding debugQuery=true and see what gets parsed, etc.
 and then try running that query.


 
  Here is my mlt configuration:
 
  requestHandler name=/mlt class=solr.MoreLikeThisHandler
   lst name=defaults
 str name=mlt.fltitle,score/str
 int name=mlt.mindf1/int
 int name=rows100/int
 str name=fl*,score/str
/lst
  /requestHandler
 
  And here is the URL I use to get results:
  http://localhost:8983/solr/mlt/?q=title:Some random title
 
  Any help on this matter would be greatly appreciated. Thanks!

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem docs using Solr/Lucene:
 http://www.lucidimagination.com/search

MoreLikeThis

2011-04-21 Thread Brian Lamb

Hi all,

I have an mlt search set up on my site with over 2 million records in the
index. Normally, my results look like:

response
  lst name=responseHeader
int name=status0/int
int name=QTime204/int
  /lst
  result name=match numFound=41750 start=0
doc
  str name=titleSome result./str
/doc
  /result
  result name=response numFound=130872 start=0
doc
  str name=titleA similar result/str
/doc
...
  /result
/response

And there are 100 results under response. However, in some cases, there are
no results under response. Why is this the case and is there anything I
can do about it?

Here is my mlt configuration:

requestHandler name=/mlt class=solr.MoreLikeThisHandler
  lst name=defaults
str name=mlt.fltitle,score/str
int name=mlt.mindf1/int
int name=rows100/int
str name=fl*,score/str
   /lst
/requestHandler

And here is the URL I use to get results:
http://localhost:8983/solr/mlt/?q=title:Some random title

Any help on this matter would be greatly appreciated. Thanks!

Brian Lamb

Re: MoreLikeThis match

2011-04-11 Thread Brian Lamb

Does anyone have any thoughts on this one?

On Fri, Apr 8, 2011 at 9:26 AM, Brian Lamb brian.l...@journalexperts.comwrote:

 I've looked at both wiki pages and none really clarify the difference
 between these two. If I copy and paste an existing index value for field and
 do an mlt search, it shows up under match but not results. What is the
 difference between these two?


 On Thu, Apr 7, 2011 at 2:24 PM, Brian Lamb 
 brian.l...@journalexperts.comwrote:

 Actually, what is the difference between match and response? It seems
 that match always returns one result but I've thrown a few cases at it where
 the score of the highest response is higher than the score of match. And
 then there are cases where the match score dwarfs the highest response
 score.


 On Thu, Apr 7, 2011 at 1:30 PM, Brian Lamb brian.l...@journalexperts.com
  wrote:

 Hi all,

 I've been using MoreLikeThis for a while through select:

 http://localhost:8983/solr/select/?q=field:more like
 thismlt=truemlt.fl=fieldrows=100fl=*,score

 I was looking over the wiki page today and saw that you can also do this:

 http://localhost:8983/solr/mlt/?q=field:more like
 thismlt=truemlt.fl=fieldrows=100

 which seems to run faster and do a better job overall. When the results
 are returned, they are formatted like this:

 response
   lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
   /lst
   result name=match numFound=24 start=0 maxScore=3.0438285
 doc
   float name=score3.0438285/float
   str name=id5/str
 /doc
   /result
   result name=response numFound=4077 start=0
 maxScore=0.12775186
 doc
   float name=score0.1125823/float
   str name=id3/str
 /doc
 doc
   float name=score0.10231556/float
   str name=id8/str
 /doc
  ...
   /result
 /response

 It seems that it always returns just 1 response under match and response
 is set by the rows parameter. How can I get more than one result under
 match?

 What I'm trying to do here is whatever is set for field:, I would like to
 return the top 100 records that match that search based on more like this.

 Thanks,

 Brian Lamb

Re: MoreLikeThis match

2011-04-08 Thread Brian Lamb

I've looked at both wiki pages and none really clarify the difference
between these two. If I copy and paste an existing index value for field and
do an mlt search, it shows up under match but not results. What is the
difference between these two?

On Thu, Apr 7, 2011 at 2:24 PM, Brian Lamb brian.l...@journalexperts.comwrote:

 Actually, what is the difference between match and response? It seems
 that match always returns one result but I've thrown a few cases at it where
 the score of the highest response is higher than the score of match. And
 then there are cases where the match score dwarfs the highest response
 score.


 On Thu, Apr 7, 2011 at 1:30 PM, Brian Lamb 
 brian.l...@journalexperts.comwrote:

 Hi all,

 I've been using MoreLikeThis for a while through select:

 http://localhost:8983/solr/select/?q=field:more like
 thismlt=truemlt.fl=fieldrows=100fl=*,score

 I was looking over the wiki page today and saw that you can also do this:

 http://localhost:8983/solr/mlt/?q=field:more like
 thismlt=truemlt.fl=fieldrows=100

 which seems to run faster and do a better job overall. When the results
 are returned, they are formatted like this:

 response
   lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
   /lst
   result name=match numFound=24 start=0 maxScore=3.0438285
 doc
   float name=score3.0438285/float
   str name=id5/str
 /doc
   /result
   result name=response numFound=4077 start=0 maxScore=0.12775186
 doc
   float name=score0.1125823/float
   str name=id3/str
 /doc
 doc
   float name=score0.10231556/float
   str name=id8/str
 /doc
  ...
   /result
 /response

 It seems that it always returns just 1 response under match and response
 is set by the rows parameter. How can I get more than one result under
 match?

 What I'm trying to do here is whatever is set for field:, I would like to
 return the top 100 records that match that search based on more like this.

 Thanks,

 Brian Lamb

MoreLikeThis match

2011-04-07 Thread Brian Lamb

Hi all,

I've been using MoreLikeThis for a while through select:

http://localhost:8983/solr/select/?q=field:more like
thismlt=truemlt.fl=fieldrows=100fl=*,score

I was looking over the wiki page today and saw that you can also do this:

http://localhost:8983/solr/mlt/?q=field:more like
thismlt=truemlt.fl=fieldrows=100

which seems to run faster and do a better job overall. When the results are
returned, they are formatted like this:

response
  lst name=responseHeader
int name=status0/int
int name=QTime1/int
  /lst
  result name=match numFound=24 start=0 maxScore=3.0438285
doc
  float name=score3.0438285/float
  str name=id5/str
/doc
  /result
  result name=response numFound=4077 start=0 maxScore=0.12775186
doc
  float name=score0.1125823/float
  str name=id3/str
/doc
doc
  float name=score0.10231556/float
  str name=id8/str
/doc
 ...
  /result
/response

It seems that it always returns just 1 response under match and response is
set by the rows parameter. How can I get more than one result under match?

What I'm trying to do here is whatever is set for field:, I would like to
return the top 100 records that match that search based on more like this.

Thanks,

Brian Lamb

Re: MoreLikeThis match

2011-04-07 Thread Brian Lamb

Actually, what is the difference between match and response? It seems
that match always returns one result but I've thrown a few cases at it where
the score of the highest response is higher than the score of match. And
then there are cases where the match score dwarfs the highest response
score.

On Thu, Apr 7, 2011 at 1:30 PM, Brian Lamb brian.l...@journalexperts.comwrote:

 Hi all,

 I've been using MoreLikeThis for a while through select:

 http://localhost:8983/solr/select/?q=field:more like
 thismlt=truemlt.fl=fieldrows=100fl=*,score

 I was looking over the wiki page today and saw that you can also do this:

 http://localhost:8983/solr/mlt/?q=field:more like
 thismlt=truemlt.fl=fieldrows=100

 which seems to run faster and do a better job overall. When the results are
 returned, they are formatted like this:

 response
   lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
   /lst
   result name=match numFound=24 start=0 maxScore=3.0438285
 doc
   float name=score3.0438285/float
   str name=id5/str
 /doc
   /result
   result name=response numFound=4077 start=0 maxScore=0.12775186
 doc
   float name=score0.1125823/float
   str name=id3/str
 /doc
 doc
   float name=score0.10231556/float
   str name=id8/str
 /doc
  ...
   /result
 /response

 It seems that it always returns just 1 response under match and response is
 set by the rows parameter. How can I get more than one result under match?

 What I'm trying to do here is whatever is set for field:, I would like to
 return the top 100 records that match that search based on more like this.

 Thanks,

 Brian Lamb

Re: Matching the beginning of a word within a term

2011-04-04 Thread Brian Lamb

Thank you both for your replies. It looks like EdgeNGramFilter will do the
job nicely. Time to reindex...again.

On Fri, Apr 1, 2011 at 8:31 AM, Jan Høydahl jan@cominvent.com wrote:

Check out
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory
Don't know if it works with phrases though

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 31. mars 2011, at 16.49, Brian Lamb wrote:

No, I don't really want to break down the words into subwords. In the
example I provided, I would not want kind to match either record
because
it is not at the beginning of the word even though kind appears in both
records as part of a word.

On Wed, Mar 30, 2011 at 4:42 PM, lboutros boutr...@gmail.com wrote:

Do you want to tokenize subwords based on dictionaries ? A bit like
disagglutination of german words ?

If so, something like this could help :
DictionaryCompoundWordTokenFilter

http://search.lucidimagination.com/search/document/CDRG_ch05_5.8.8

Ludovic

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html

2011/3/30 Brian Lamb [via Lucene]
ml-node+2754668-300063934-383...@n3.nabble.com

Hi all,

I have a field set up like this:

field name=common_names multiValued=true type=text
indexed=true
stored=true required=false /

And I have some records:

RECORD1
arr name=common_names
strcompanion to mankind/str
strpooch/str
/arr

RECORD2
arr name=common_names
strcompanion to womankind/str
strman's worst enemy/str
/arr

I would like to write a query that will match the beginning of a word
within
the term. Here is the query I would use as it exists now:

http://localhost:8983/solr/search/?q=*:*fq={!q.op=AND%20df=common_names}
companion

man~10

In the above example. I would want to return only RECORD1.

The query as it exists right now is designed to only match records
where
both words are present in the same term. So if I changed man to mankind
in
the query, RECORD1 will be returned.

Even though the phrases companion and man exist in the same term in
RECORD2,
I do not want RECORD2 to be returned because 'man' is not at the
beginning
of the word.

How can I achieve this?

Thanks,

Brian Lamb

--
If you reply to this email, your message will be added to the
discussion
below:

http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2754668.html
To start a new topic under Solr - User, email
ml-node+472068-1765922688-383...@n3.nabble.com
To unsubscribe from Solr - User, click here

http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472068code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=
.

-
Jouve
France.
--
View this message in context:

http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2755561.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Matching on a multi valued field

2011-04-04 Thread Brian Lamb

I just noticed Juan's response and I find that I am encountering that very
issue in a few cases. Boosting is a good way to put the more relevant
results to the top but it is possible to only have the correct results
returned?

On Wed, Mar 30, 2011 at 11:51 AM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Thank you all for your responses. The field had already been set up with
 positionIncrementGap=100 so I just needed to add in the slop.


 On Tue, Mar 29, 2011 at 6:32 PM, Juan Pablo Mora jua...@informa.eswrote:

  A multiValued field
  is actually a single field with all data separated with
 positionIncrement.
  Try setting that value high enough and use a PhraseQuery.


 That is true but you cannot do things like:

 q=bar* foo*~10 with default query search.

 and if you use dismax you will have the same problems with multivalued
 fields. Imagine the situation:

 Doc1:
field A: [foo bar,dooh] 2 values

 Doc2:
field A: [bar dooh, whatever] Another 2 values

 the query:
qt=dismax  qf= fieldA  q = ( bar dooh )

 will return both Doc1 and Doc2. The only thing you can do in this
 situation is boost phrase query in Doc2 with parameter pf in order to get
 Doc2 in the first position of the results:

 pf = fieldA^1


 Thanks,
 JP.


 El 29/03/2011, a las 23:14, Markus Jelsma escribió:

  orly, all replies came in while sending =)
 
  Hi,
 
  Your filter query is looking for a match of man's friend in a single
  field. Regardless of analysis of the common_names field, all terms are
  present in the common_names field of both documents. A multiValued
 field
  is actually a single field with all data separated with
 positionIncrement.
  Try setting that value high enough and use a PhraseQuery.
 
  That should work
 
  Cheers,
 
  Hi all,
 
  I have a field set up like this:
 
  field name=common_names multiValued=true type=text
 indexed=true
  stored=true required=false /
 
  And I have some records:
 
  RECORD1
  arr name=common_names
 
   strman's best friend/str
   strpooch/str
 
  /arr
 
  RECORD2
  arr name=common_names
 
   strman's worst enemy/str
   strfriend to no one/str
 
  /arr
 
  Now if I do a search such as:
  http://localhost:8983/solr/search/?q=*:*fq={!q.op=AND
  df=common_names}man's friend
 
  Both records are returned. However, I only want RECORD1 returned. I
  understand why RECORD2 is returned but how can I structure my query so
  that only RECORD1 is returned?
 
  Thanks,
 
  Brian Lamb

Re: Matching the beginning of a word within a term

2011-03-31 Thread Brian Lamb

No, I don't really want to break down the words into subwords. In the
example I provided, I would not want kind to match either record because
it is not at the beginning of the word even though kind appears in both
records as part of a word.

On Wed, Mar 30, 2011 at 4:42 PM, lboutros boutr...@gmail.com wrote:

 Do you want to tokenize subwords based on dictionaries ? A bit like
 disagglutination of german words ?

 If so, something like this could help : DictionaryCompoundWordTokenFilter

 http://search.lucidimagination.com/search/document/CDRG_ch05_5.8.8

 Ludovic


 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html

 2011/3/30 Brian Lamb [via Lucene] 
 ml-node+2754668-300063934-383...@n3.nabble.com

  Hi all,
 
  I have a field set up like this:
 
  field name=common_names multiValued=true type=text indexed=true
  stored=true required=false /
 
  And I have some records:
 
  RECORD1
  arr name=common_names
  strcompanion to mankind/str
  strpooch/str
  /arr
 
  RECORD2
  arr name=common_names
  strcompanion to womankind/str
  strman's worst enemy/str
  /arr
 
  I would like to write a query that will match the beginning of a word
  within
  the term. Here is the query I would use as it exists now:
 
 
 http://localhost:8983/solr/search/?q=*:*fq={!q.op=AND%20df=common_names}
 companion
 
  man~10
 
  In the above example. I would want to return only RECORD1.
 
  The query as it exists right now is designed to only match records where
  both words are present in the same term. So if I changed man to mankind
 in
  the query, RECORD1 will be returned.
 
  Even though the phrases companion and man exist in the same term in
  RECORD2,
  I do not want RECORD2 to be returned because 'man' is not at the
 beginning
  of the word.
 
  How can I achieve this?
 
  Thanks,
 
  Brian Lamb
 
 
  --
   If you reply to this email, your message will be added to the discussion
  below:
 
 
 http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2754668.html
   To start a new topic under Solr - User, email
  ml-node+472068-1765922688-383...@n3.nabble.com
  To unsubscribe from Solr - User, click here
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472068code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=
 .
 
 


 -
 Jouve
 France.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2755561.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Matching on a multi valued field

2011-03-30 Thread Brian Lamb

Thank you all for your responses. The field had already been set up with
positionIncrementGap=100 so I just needed to add in the slop.

On Tue, Mar 29, 2011 at 6:32 PM, Juan Pablo Mora jua...@informa.es wrote:

  A multiValued field
  is actually a single field with all data separated with
 positionIncrement.
  Try setting that value high enough and use a PhraseQuery.


 That is true but you cannot do things like:

 q=bar* foo*~10 with default query search.

 and if you use dismax you will have the same problems with multivalued
 fields. Imagine the situation:

 Doc1:
field A: [foo bar,dooh] 2 values

 Doc2:
field A: [bar dooh, whatever] Another 2 values

 the query:
qt=dismax  qf= fieldA  q = ( bar dooh )

 will return both Doc1 and Doc2. The only thing you can do in this situation
 is boost phrase query in Doc2 with parameter pf in order to get Doc2 in the
 first position of the results:

 pf = fieldA^1


 Thanks,
 JP.


 El 29/03/2011, a las 23:14, Markus Jelsma escribió:

  orly, all replies came in while sending =)
 
  Hi,
 
  Your filter query is looking for a match of man's friend in a single
  field. Regardless of analysis of the common_names field, all terms are
  present in the common_names field of both documents. A multiValued field
  is actually a single field with all data separated with
 positionIncrement.
  Try setting that value high enough and use a PhraseQuery.
 
  That should work
 
  Cheers,
 
  Hi all,
 
  I have a field set up like this:
 
  field name=common_names multiValued=true type=text
 indexed=true
  stored=true required=false /
 
  And I have some records:
 
  RECORD1
  arr name=common_names
 
   strman's best friend/str
   strpooch/str
 
  /arr
 
  RECORD2
  arr name=common_names
 
   strman's worst enemy/str
   strfriend to no one/str
 
  /arr
 
  Now if I do a search such as:
  http://localhost:8983/solr/search/?q=*:*fq={!q.op=AND
  df=common_names}man's friend
 
  Both records are returned. However, I only want RECORD1 returned. I
  understand why RECORD2 is returned but how can I structure my query so
  that only RECORD1 is returned?
 
  Thanks,
 
  Brian Lamb

Matching the beginning of a word within a term

2011-03-30 Thread Brian Lamb

Hi all,

I have a field set up like this:

field name=common_names multiValued=true type=text indexed=true
stored=true required=false /

And I have some records:

RECORD1
arr name=common_names
strcompanion to mankind/str
strpooch/str
/arr

RECORD2
arr name=common_names
strcompanion to womankind/str
strman's worst enemy/str
/arr

I would like to write a query that will match the beginning of a word within
the term. Here is the query I would use as it exists now:

http://localhost:8983/solr/search/?q=*:*fq={!q.op=AND%20df=common_names}companion
man~10

In the above example. I would want to return only RECORD1.

The query as it exists right now is designed to only match records where
both words are present in the same term. So if I changed man to mankind in
the query, RECORD1 will be returned.

Even though the phrases companion and man exist in the same term in RECORD2,
I do not want RECORD2 to be returned because 'man' is not at the beginning
of the word.

How can I achieve this?

Thanks,

Brian Lamb

String field

2011-03-29 Thread Brian Lamb

Hi all,

I'm a little confused about the string field. I read somewhere that if I
want to do an exact match, I should use an exact match. So I made a few
modifications to my schema file:

field name=id type=string indexed=true stored=true required=false
/
field name=common_names multiValued=true type=string indexed=true
stored=true required=false /
field name=genus type=string indexed=true stored=true
required=false /
field name=species type=string indexed=true stored=true
required=false /

And did a full import but when I do a search and return all fields, only id
is showing up. The only difference is that id is my primary key field so
that could be why it is showing up but why aren't the others showing up?

Thanks,

Brian Lamb

Re: String field

2011-03-29 Thread Brian Lamb

The full import wasn't spitting out any errors on the web page but in
looking at the logs, there were errors. Correcting those errors solved that
issue.

Thanks,

Brian Lamb

On Tue, Mar 29, 2011 at 2:44 PM, Erick Erickson erickerick...@gmail.comwrote:

 try the schema browser from the admin page to be sure the fields
 you *think* are in the index really are. Did you do a commit
 after indexing? Did you re-index after the schema changes? Are
 you 100% sure that, if you did re-index, the new fields were in the
 docs submitted?

 Best
 Erick

 On Tue, Mar 29, 2011 at 11:46 AM, Brian Lamb
 brian.l...@journalexperts.com wrote:
  Hi all,
 
  I'm a little confused about the string field. I read somewhere that if I
  want to do an exact match, I should use an exact match. So I made a few
  modifications to my schema file:
 
  field name=id type=string indexed=true stored=true
 required=false
  /
  field name=common_names multiValued=true type=string
 indexed=true
  stored=true required=false /
  field name=genus type=string indexed=true stored=true
  required=false /
  field name=species type=string indexed=true stored=true
  required=false /
 
  And did a full import but when I do a search and return all fields, only
 id
  is showing up. The only difference is that id is my primary key field so
  that could be why it is showing up but why aren't the others showing up?
 
  Thanks,
 
  Brian Lamb

Matching on a multi valued field

2011-03-29 Thread Brian Lamb

Hi all,

I have a field set up like this:

field name=common_names multiValued=true type=text indexed=true
stored=true required=false /

And I have some records:

RECORD1
arr name=common_names
  strman's best friend/str
  strpooch/str
/arr

RECORD2
arr name=common_names
  strman's worst enemy/str
  strfriend to no one/str
/arr

Now if I do a search such as:
http://localhost:8983/solr/search/?q=*:*fq={!q.op=AND df=common_names}man's
friend

Both records are returned. However, I only want RECORD1 returned. I
understand why RECORD2 is returned but how can I structure my query so that
only RECORD1 is returned?

Thanks,

Brian Lamb

Re: Default operator

2011-03-28 Thread Brian Lamb

Thank you both for your input. I ended up using Ahmet's way because it seems
to fit better with the rest of the application.

On Sat, Mar 26, 2011 at 6:02 AM, lboutros boutr...@gmail.com wrote:

 The other way could be to extend the SolrQueryParser to read a per field
 default operator in the solr config file. Then it should be possible to
 override this functions :

 setDefaultOperator
 getDefaultOperator

 and this two which are using the default operator :

 getFieldQuery
 addClause

 The you just have to declare it in the solr config file and configure your
 default operators.

 Ludovic.



 -
 Jouve
 France.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Default-operator-tp2732237p2734931.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Default operator

2011-03-25 Thread Brian Lamb

Hi all,

I know that I can change the default operator in two ways:

1) *solrQueryParser defaultOperator*=AND|OR/
2) Add q.op=AND

I'm wondering if it is possible to change the default operator for a
specific field only? For example, if I use the URL:

http://localhost:8983/solr/search/?q=animal:german shepherdtype:dog canine

I would want it to effectively be:

http://localhost:8983/solr/search/?q=animal:german AND shepherdtype:dog OR
canine

Other than parsing the URL before I send it out, is there a way to do this?

Thanks,

Brian Lamb

Re: Adding the suggest component

2011-03-23 Thread Brian Lamb

I'm still confused as to why I'm getting this error. To me it reads that the
.java file was declared incorrectly but I shouldn't need to change those
files so where am I doing something incorrectly?

On Tue, Mar 22, 2011 at 3:40 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 That fixed that error as well as the could not initialize Dataimport class
 error. Now I'm getting:

 org.apache.solr.common.SolrException: Error Instantiating Request Handler,
 org.apache.solr.handler.dataimport.DataImportHandler is not a
 org.apache.solr.request.SolrRequestHandler

 I can't find anything on this one. What I've added to the solrconfig.xml
 file matches whats in example-DIH so I don't quite understand what the issue
 is here. It sounds to me like it is not declared properly somewhere but I'm
 not sure where/why.

 Here is the relevant portion of my solrconfig.xml file:

 requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
  str name=configdb-data-config.xml/str
/lst
 /requestHandler

 Thanks for all the help so far. You all have been great.

 Brian Lamb

 On Tue, Mar 22, 2011 at 3:17 PM, Ahmet Arslan iori...@yahoo.com wrote:

  java.lang.NoClassDefFoundError: Could not initialize class
  org.apache.solr.handler.dataimport.DataImportHandler
  at java.lang.Class.forName0(Native Method)
 
  java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
  at
 
 org.apache.solr.handler.dataimport.DataImportHandler.clinit(DataImportHandler.java:72)
 
  Caused by: java.lang.ClassNotFoundException:
  org.slf4j.LoggerFactory
  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 

 You can find slf4j- related jars in \trunk\solr\lib, but this error is
 weird.

Re: Adding the suggest component

2011-03-23 Thread Brian Lamb

Thank you for the suggestion. I followed your advice and was able to get a
version up and running. Thanks again for all the help!

On Wed, Mar 23, 2011 at 1:55 PM, Ahmet Arslan iori...@yahoo.com wrote:

  I'm still confused as to why I'm
  getting this error. To me it reads that the
  .java file was declared incorrectly but I shouldn't need to
  change those
  files so where am I doing something incorrectly?
 

 Brian, I think best thing to do is checkout a new clean copy from
 subversion and then do things step by step on this clean copy.

Re: Adding the suggest component

2011-03-22 Thread Brian Lamb

Thanks everyone for the advice. I checked out a recent version from SVN and
ran:

ant clean example

This worked just fine. However when I went to start the solr server, I get
this error message:

SEVERE: org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.dataimport.DataImportHandler'

It looks like those files are there:

contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/

But for some reason, they aren't able to be found. Where would I update this
setting and what would I update it to?

Thanks,

Brian Lamb

On Mon, Mar 21, 2011 at 10:15 AM, Erick Erickson erickerick...@gmail.comwrote:

 OK, I think you're jumping ahead and trying to do
 too many things at once.

 What did you download? Source? The distro? The error
 you posted usually happens for me when I haven't
 compiled the example target from source. So I'd guess
 you don't have the proper targets built. This assumes you
 downloaded the source via SVN.

 If you downloaded a distro, I'd start by NOT copying anything
 anywhere, just go to the example code and start Solr. Make
 sure you have what you think you have.

 I've seen interesting things get cured by removing the entire
 directory where your servlet container unpacks war files, but
 that's usually in development environments.

 When I get in these situations, I usually find it's best to back
 up, do one thing at a time and verify that I get the expected
 results at each step. It's tedious, but

 Best
 Erick


 On Fri, Mar 18, 2011 at 4:18 PM, Ahmet Arslan iori...@yahoo.com wrote:
  downloaded a recent version and
there were the following files/folders:
   
build.xml
dev-tools
LICENSE.txt
lucene
NOTICE.txt
README.txt
solr
   
So I did cp -r solr/* /path/to/solr/stuff/ and
  started solr. I didn't get
any error message but I only got the following
  messages:
 
  How do you start solr? using java -jar start.jar? Did you run 'ant clean
 example' in the solr folder?

Re: Adding the suggest component

2011-03-22 Thread Brian Lamb

I found the following in the build.xml file:

invoke-javadoc destdir=${build.javadoc}
   sources
packageset dir=${src}/common /
packageset dir=${src}/solrj /
packageset dir=${src}/java /
packageset dir=${src}/webapp/src /
packageset dir=contrib/dataimporthandler/src/main/java /
packageset dir=contrib/clustering/src/main/java /
packageset dir=contrib/extraction/src/main/java /
packageset dir=contrib/uima/src/main/java /
packageset dir=contrib/analysis-extras/src/java /
group title=Core packages=org.apache.* /
group title=Common packages=org.apache.solr.common.* /
group title=SolrJ packages=org.apache.solr.client.solrj* /
group title=contrib: DataImportHandler
packages=org.apache.solr.handler.dataimport* /
group title=contrib: Clustering
packages=org.apache.solr.handler.clustering* /
group title=contrib: Solr Cell
packages=org.apache.solr.handler.extraction* /
group title=contrib: Solr UIMA packages=org.apache.solr.uima* /
  /sources
/invoke-javadoc

It looks like the dataimport handler path is correct in there so I don't
understand why it's not being compile.

I ran ant example again today but I'm still getting the same error.

Thanks,

Brian Lamb

On Tue, Mar 22, 2011 at 11:28 AM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Thanks everyone for the advice. I checked out a recent version from SVN and
 ran:

 ant clean example

 This worked just fine. However when I went to start the solr server, I get
 this error message:

 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.apache.solr.handler.dataimport.DataImportHandler'

 It looks like those files are there:

 contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/

 But for some reason, they aren't able to be found. Where would I update
 this setting and what would I update it to?

 Thanks,

 Brian Lamb

 On Mon, Mar 21, 2011 at 10:15 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 OK, I think you're jumping ahead and trying to do
 too many things at once.

 What did you download? Source? The distro? The error
 you posted usually happens for me when I haven't
 compiled the example target from source. So I'd guess
 you don't have the proper targets built. This assumes you
 downloaded the source via SVN.

 If you downloaded a distro, I'd start by NOT copying anything
 anywhere, just go to the example code and start Solr. Make
 sure you have what you think you have.

 I've seen interesting things get cured by removing the entire
 directory where your servlet container unpacks war files, but
 that's usually in development environments.

 When I get in these situations, I usually find it's best to back
 up, do one thing at a time and verify that I get the expected
 results at each step. It's tedious, but

 Best
 Erick


 On Fri, Mar 18, 2011 at 4:18 PM, Ahmet Arslan iori...@yahoo.com wrote:
  downloaded a recent version and
there were the following files/folders:
   
build.xml
dev-tools
LICENSE.txt
lucene
NOTICE.txt
README.txt
solr
   
So I did cp -r solr/* /path/to/solr/stuff/ and
  started solr. I didn't get
any error message but I only got the following
  messages:
 
  How do you start solr? using java -jar start.jar? Did you run 'ant clean
 example' in the solr folder?

Re: Adding the suggest component

2011-03-22 Thread Brian Lamb

Awesome! That fixed that problem. I'm getting another class not found error
but I'll see if I can fix it on my own first.

On Tue, Mar 22, 2011 at 11:56 AM, Ahmet Arslan iori...@yahoo.com wrote:

 --- On Tue, 3/22/11, Brian Lamb brian.l...@journalexperts.com wrote:

  From: Brian Lamb brian.l...@journalexperts.com
  Subject: Re: Adding the suggest component
  To: solr-user@lucene.apache.org
  Cc: Erick Erickson erickerick...@gmail.com
  Date: Tuesday, March 22, 2011, 5:28 PM
  Thanks everyone for the advice. I
  checked out a recent version from SVN and
  ran:

  ant clean example

  This worked just fine. However when I went to start the
  solr server, I get
  this error message:

  SEVERE: org.apache.solr.common.SolrException: Error loading
  class
  'org.apache.solr.handler.dataimport.DataImportHandler'

 run 'ant clean dist' and copy trunk/solr/dist/

 apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar
 apache-solr-dataimporthandler-4.0-SNAPSHOT.jar

 to solrHome/lib directory.

Re: Adding the suggest component

2011-03-22 Thread Brian Lamb

I fixed a few other exceptions it threw when I started the server but I
don't know how to fix this one:

java.lang.NoClassDefFoundError: Could not initialize class
org.apache.solr.handler.dataimport.DataImportHandler
at java.lang.Class.forName0(Native Method)

java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at
org.apache.solr.handler.dataimport.DataImportHandler.clinit(DataImportHandler.java:72)

Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

I've searched Google but haven't been able to find a reason why this happens
and how to fix it.

Thanks,

Brian Lamb

On Tue, Mar 22, 2011 at 12:54 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Awesome! That fixed that problem. I'm getting another class not found error
 but I'll see if I can fix it on my own first.


 On Tue, Mar 22, 2011 at 11:56 AM, Ahmet Arslan iori...@yahoo.com wrote:


 --- On Tue, 3/22/11, Brian Lamb brian.l...@journalexperts.com wrote:

  From: Brian Lamb brian.l...@journalexperts.com
  Subject: Re: Adding the suggest component
  To: solr-user@lucene.apache.org
  Cc: Erick Erickson erickerick...@gmail.com
  Date: Tuesday, March 22, 2011, 5:28 PM
  Thanks everyone for the advice. I
  checked out a recent version from SVN and
  ran:
 
  ant clean example
 
  This worked just fine. However when I went to start the
  solr server, I get
  this error message:
 
  SEVERE: org.apache.solr.common.SolrException: Error loading
  class
  'org.apache.solr.handler.dataimport.DataImportHandler'

 run 'ant clean dist' and copy trunk/solr/dist/

 apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar
 apache-solr-dataimporthandler-4.0-SNAPSHOT.jar

 to solrHome/lib directory.

Re: Adding the suggest component

2011-03-22 Thread Brian Lamb

That fixed that error as well as the could not initialize Dataimport class
error. Now I'm getting:

org.apache.solr.common.SolrException: Error Instantiating Request Handler,
org.apache.solr.handler.dataimport.DataImportHandler is not a
org.apache.solr.request.SolrRequestHandler

I can't find anything on this one. What I've added to the solrconfig.xml
file matches whats in example-DIH so I don't quite understand what the issue
is here. It sounds to me like it is not declared properly somewhere but I'm
not sure where/why.

Here is the relevant portion of my solrconfig.xml file:

requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
   lst name=defaults
 str name=configdb-data-config.xml/str
   /lst
/requestHandler

Thanks for all the help so far. You all have been great.

Brian Lamb

On Tue, Mar 22, 2011 at 3:17 PM, Ahmet Arslan iori...@yahoo.com wrote:

  java.lang.NoClassDefFoundError: Could not initialize class
  org.apache.solr.handler.dataimport.DataImportHandler
  at java.lang.Class.forName0(Native Method)
 
  java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
  at
 
 org.apache.solr.handler.dataimport.DataImportHandler.clinit(DataImportHandler.java:72)
 
  Caused by: java.lang.ClassNotFoundException:
  org.slf4j.LoggerFactory
  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 

 You can find slf4j- related jars in \trunk\solr\lib, but this error is
 weird.

Re: Adding the suggest component

2011-03-18 Thread Brian Lamb

That does seem like a better solution. I downloaded a recent version and
there were the following files/folders:

build.xml
dev-tools
LICENSE.txt
lucene
NOTICE.txt
README.txt
solr

So I did cp -r solr/* /path/to/solr/stuff/ and started solr. I didn't get
any error message but I only got the following messages:

2011-03-18 14:11:02.016:INFO::Logging to STDERR via
org.mortbay.log.StdErrLog
2011-03-18 14:11:02.240:INFO::jetty-6.1-SNAPSHOT
2011-03-18 14:11:02.284:INFO::Started SocketConnector@0.0.0.0:8983

Where as before I got a bunch of messages indicating various libraries had
been loaded. Additionally, when I go to http://localhost/solr/admin/, I get
the following message:

HTTP ERROR: 404

Problem accessing /solr/admin. Reason:

NOT_FOUND

What did I do incorrectly?

Thanks,

Brian Lamb


On Fri, Mar 18, 2011 at 9:04 AM, Erick Erickson erickerick...@gmail.comwrote:

 What do you mean you copied the contents...to the right place? If you
 checked out trunk and copied the files into 1.4.1, you have mixed source
 files between disparate versions. All bets are off.

 Or do you mean jar files? or???

 I'd build the source you checked out (at the Solr level) and use that
 rather
 than try to mix-n-match.

 BTW, if you're just starting (as in not in production), you may want to
 consider
 using 3.1, as it's being released even as we speak and has many
 improvements
 over 1.4. You can get a nightly build from here:
 https://builds.apache.org/hudson/view/S-Z/view/Solr/

 Best
 Erick

 On Thu, Mar 17, 2011 at 3:36 PM, Brian Lamb
 brian.l...@journalexperts.com wrote:
  Hi all,
 
  When I installed Solr, I downloaded the most recent version (1.4.1) I
  believe. I wanted to implement the Suggester (
  http://wiki.apache.org/solr/Suggester). I copied and pasted the
 information
  there into my solrconfig.xml file but I'm getting the following error:
 
  Error loading class 'org.apache.solr.spelling.suggest.Suggester'
 
  I read up on this error and found that I needed to checkout a newer
 version
  from SVN. I checked out a full version and copied the contents of
  src/java/org/apache/spelling/suggest to the same location on my set up.
  However, I am still receiving this error.
 
  Did I not put the files in the right place? What am I doing incorrectly?
 
  Thanks,
 
  Brian Lamb

Re: Adding the suggest component

2011-03-18 Thread Brian Lamb

Sorry, that was a typo on my part.

I was using http://localhost:8983/solr/admin and getting the above error
messages.

On Fri, Mar 18, 2011 at 2:57 PM, Geert-Jan Brits gbr...@gmail.com wrote:

  2011-03-18 14:11:02.284:INFO::Started SocketConnector@0.0.0.0:8983
 Solr started on port 8983

 instead of this:
  http://localhost/solr/admin/

 try this instead:
 http://localhost:8983/solr/admin/ http://localhost/solr/admin/

 Cheers,
 Geert-Jan



 2011/3/18 Brian Lamb brian.l...@journalexperts.com

  That does seem like a better solution. I downloaded a recent version and
  there were the following files/folders:
 
  build.xml
  dev-tools
  LICENSE.txt
  lucene
  NOTICE.txt
  README.txt
  solr
 
  So I did cp -r solr/* /path/to/solr/stuff/ and started solr. I didn't get
  any error message but I only got the following messages:
 
  2011-03-18 14:11:02.016:INFO::Logging to STDERR via
  org.mortbay.log.StdErrLog
  2011-03-18 14:11:02.240:INFO::jetty-6.1-SNAPSHOT
  2011-03-18 14:11:02.284:INFO::Started SocketConnector@0.0.0.0:8983
 
  Where as before I got a bunch of messages indicating various libraries
 had
  been loaded. Additionally, when I go to http://localhost/solr/admin/, I
  get
  the following message:
 
  HTTP ERROR: 404
 
  Problem accessing /solr/admin. Reason:
 
 NOT_FOUND
 
  What did I do incorrectly?
 
  Thanks,
 
  Brian Lamb
 
 
  On Fri, Mar 18, 2011 at 9:04 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   What do you mean you copied the contents...to the right place? If you
   checked out trunk and copied the files into 1.4.1, you have mixed
 source
   files between disparate versions. All bets are off.
  
   Or do you mean jar files? or???
  
   I'd build the source you checked out (at the Solr level) and use that
   rather
   than try to mix-n-match.
  
   BTW, if you're just starting (as in not in production), you may want to
   consider
   using 3.1, as it's being released even as we speak and has many
   improvements
   over 1.4. You can get a nightly build from here:
   https://builds.apache.org/hudson/view/S-Z/view/Solr/
  
   Best
   Erick
  
   On Thu, Mar 17, 2011 at 3:36 PM, Brian Lamb
   brian.l...@journalexperts.com wrote:
Hi all,
   
When I installed Solr, I downloaded the most recent version (1.4.1) I
believe. I wanted to implement the Suggester (
http://wiki.apache.org/solr/Suggester). I copied and pasted the
   information
there into my solrconfig.xml file but I'm getting the following
 error:
   
Error loading class 'org.apache.solr.spelling.suggest.Suggester'
   
I read up on this error and found that I needed to checkout a newer
   version
from SVN. I checked out a full version and copied the contents of
src/java/org/apache/spelling/suggest to the same location on my set
 up.
However, I am still receiving this error.
   
Did I not put the files in the right place? What am I doing
  incorrectly?
   
Thanks,
   
Brian Lamb

Adding the suggest component

2011-03-17 Thread Brian Lamb

Hi all,

When I installed Solr, I downloaded the most recent version (1.4.1) I
believe. I wanted to implement the Suggester (
http://wiki.apache.org/solr/Suggester). I copied and pasted the information
there into my solrconfig.xml file but I'm getting the following error:

Error loading class 'org.apache.solr.spelling.suggest.Suggester'

I read up on this error and found that I needed to checkout a newer version
from SVN. I checked out a full version and copied the contents of
src/java/org/apache/spelling/suggest to the same location on my set up.
However, I am still receiving this error.

Did I not put the files in the right place? What am I doing incorrectly?

Thanks,

Brian Lamb

Multicore

2011-03-16 Thread Brian Lamb

Hi all,

I am setting up multicore and the schema.xml file in the core0 folder says
not to sure that one because its very stripped down. So I copied the schema
from example/solr/conf but now I am getting a bunch of class not found
exceptions:

SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.KeywordMarkerFilterFactory'

For example.

I also copied over the solrconfig.xml from example/solr/conf and changed all
the lib dir=xxx paths to go up one directory higher (lib dir=../xxx /
instead). I've found that when I use my solrconfig file with the stripped
down schema.xml file, it runs correctly. But when I use the full schema xml
file, I get those errors.

Now this says to me I am not loading a library or two somewhere but I've
looked through the configuration files and cannot see any other place other
than solrconfig.xml where that would be set so what am I doing incorrectly?

Thanks,

Brian Lamb

Re: Dynamically boost search scores

2011-03-15 Thread Brian Lamb

Thank you for the advice. I looked at the page you recommended and came up
with:

http://localhost:8983/solr/search/?q=dogfl=boost_score,genus,species,scorerows=15bf=%22ord%28sum%28boost_score,1%29%29
^10%22

But appeared to have no effect. The results were in the same order as they
were when I left off the bf parameter. So what am I doing incorrectly?

Thanks,

Brian Lamb

On Mon, Mar 14, 2011 at 11:45 AM, Markus Jelsma
markus.jel...@openindex.iowrote:

 See boosting documents by function query. This way you can use document's
 boost_score field to affect the final score.

 http://wiki.apache.org/solr/FunctionQuery

 On Monday 14 March 2011 16:40:42 Brian Lamb wrote:
  Hi all,
 
  I have a field in my schema called boost_score. I would like to set it up
  so that if I pass in a certain flag, each document score is boosted by
 the
  number in boost_score.
 
  For example if I use:
 
  http://localhost/solr/search/?q=dog
 
  I would get search results like normal. But if I use:
 
  http://localhost/solr/search?q=dogboost=true
 
  The score of each document would be boosted by the number in the field
  boost_score.
 
  Unfortunately, I have no idea how to implement this actually but I'm
 hoping
  that's where you all can come in.
 
  Thanks,
 
  Brian Lamb

 --
 Markus Jelsma - CTO - Openindex
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350

Re: Sorting

2011-03-14 Thread Brian Lamb

It doesn't necessarily need to go through an XSLT but the idea remains the
same. I want have the highest scores first no matter which result they match
with.

So if the results are like this:

lst name=moreLikeThis
  result name=3 numFound=2 start=0 maxScore=0.439
doc
  float name=score0.439/float
  str name=id1/str
/doc

doc
  float name=score0.215/float
  str name=id2/str
/doc
doc
  float name=score0.115/float
  str name=id3/str
/doc
  /result
  result name=2 numFound=3 start=0 maxScore=0.539
doc
  float name=score0.539/float
  str name=id4/str
/doc

doc
  float name=score0.338/float
  str name=id5/str
/doc
  /result
/lst

I would want them to be formatted like this:

lst name=moreLikeThis
  doc
float name=score0.539/float
str name=id4/str
  /doc

  doc
float name=score0.439/float
str name=id1/str
  /doc

  doc
float name=score0.338/float
str name=id5/str
  /doc

  doc
float name=score0.215/float
str name=id2/str
  /doc

  doc
float name=score0.115/float
str name=id3/str
  /doc
/lst

The way I do it now is to fetch the results and then parse them with PHP to
simulate that but it seems horribly inefficient so I'd like to do it within
Solr if at all possible.

On Thu, Mar 10, 2011 at 4:02 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Any ideas on this one?


 On Wed, Mar 9, 2011 at 2:00 PM, Brian Lamb 
 brian.l...@journalexperts.comwrote:

 Hi all,

 I know that I can add sort=score desc to the url to sort in descending
 order. However, I would like to sort a MoreLikeThis response which returns
 records like this:

 lst name=moreLikeThis
   result name=3 numFound=113611 start=0 maxScore=0.4392774
   result name=2 numFound= start=0 maxScore=0.5392774
 /lst

 I don't want them grouped by result; I would just like have them all
 thrown together and then sorted according to score. I have an XSLT which
 does put them altogether and returns the following:

 moreLikeThis
   similar
 scorex./score
 idsome_id/id
   /similar
 /moreLikeThis

 However it appears that it basically applies the stylesheet to result
 name=3 then result name=2.

 How can I make it so that with my XSLT, the results appear sorted by
 score?

Dynamically boost search scores

2011-03-14 Thread Brian Lamb

Hi all,

I have a field in my schema called boost_score. I would like to set it up so
that if I pass in a certain flag, each document score is boosted by the
number in boost_score.

For example if I use:

http://localhost/solr/search/?q=dog

I would get search results like normal. But if I use:

http://localhost/solr/search?q=dogboost=true

The score of each document would be boosted by the number in the field
boost_score.

Unfortunately, I have no idea how to implement this actually but I'm hoping
that's where you all can come in.

Thanks,

Brian Lamb

Re: docBoost

2011-03-10 Thread Brian Lamb

Okay I think I have the idea:

dataConfig
  dataSource type=JdbcDataSource
  name=animals
  batchSize=-1
  driver=com.mysql.jdbc.Driver

url=jdbc:mysql://localhost/animals?characterEncoding=UTF8amp;zeroDateTimeBehavior=convertToNull
  user=user
  password=pass/
  script![CDATA[
function BoostScores(row) {
  // if searching for recommendations add in the boost score
   if(some_condition) {
row.put('$docBoost', row.get('boost_score'));
  } // end if(some_condition)

  return row;
} // end function BoostRecommendations(row)
  ]]/script
 document
 entity name=animal
dataSource=animals
pk=id
query=SELECT * FROM animals
  field column=id name=id /
  field column=genus name=genus /
  field column=species name=species /
  entity name=boosters
 dataSource=boosts
 query=SELECT boost_score FROM boosts WHERE animal_id=${
animal.id}
field column=boost_score name=boost_score /
  /entity
/entity
  /document
/dataConfig

Now, am I right in thinking that the boost score is only when the data is
loaded? If so, that's close to what I want to do but not exactly. I would
like to load all the data without boosting any scores but storing what the
boost score would be. And then, depending on the search, boost scores by the
value.

For example, if a user searches for dog, they would get search results that
were unboosted.

However, I would also want the option to pass in a flag of some kind so that
if a user searches for dog, they would get search results with the boost
score factored in. Ideally it would be something like:

Regular search: http://localhost/solr/search/?q=dog
Boosted search: http://localhost/solr/search?q=dogboost=true

To achieve this, would it be applied in the data import handler? If so, what
would I need to put in for some_condition?

Thanks for all the help so far. I truly do appreciate it.

Thanks,

Brian Lamb

On Wed, Mar 9, 2011 at 11:50 PM, Bill Bell billnb...@gmail.com wrote:

 Yes just add if statement based on a field type and do a row.put() only if
 that other value is a certain value.



 On 3/9/11 1:39 PM, Brian Lamb brian.l...@journalexperts.com wrote:

 That makes sense. As a follow up, is there a way to only conditionally use
 the boost score? For example, in some cases I want to use the boost score
 and in other cases I want all documents to be treated equally.
 
 On Wed, Mar 9, 2011 at 2:42 PM, Jayendra Patil
 jayendra.patil@gmail.com
  wrote:
 
  you can use the ScriptTransformer to perform the boost calcualtion and
  addition.
  http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer
 
  dataConfig
 script![CDATA[
 function f1(row)  {
 // Add boost
 row.put('$docBoost',1.5);
 return row;
 }
 ]]/script
 document
 entity name=e pk=id transformer=script:f1
  query=select * from X
 
 /entity
 /document
  /dataConfig
 
  Regards,
  Jayendra
 
 
  On Wed, Mar 9, 2011 at 2:01 PM, Brian Lamb
  brian.l...@journalexperts.com wrote:
   Anyone have any clue on this on?
  
   On Tue, Mar 8, 2011 at 2:11 PM, Brian Lamb 
  brian.l...@journalexperts.comwrote:
  
   Hi all,
  
   I am using dataimport to create my index and I want to use docBoost
 to
   assign some higher weights to certain docs. I understand the concept
  behind
   docBoost but I haven't been able to find an example anywhere that
 shows
  how
   to implement it. Assuming the following config file:
  
   document
  entity name=animal
 dataSource=animals
 pk=id
 query=SELECT * FROM animals
   field column=id name=id /
   field column=genus name=genus /
   field column=species name=species /
   entity name=boosters
  dataSource=boosts
  query=SELECT boost_score FROM boosts WHERE animal_id
 =
  ${
   animal.id}
 field column=boost_score name=boost_score /
   /entity
 /entity
   /document
  
   How do I add in a docBoost score? The boost score is currently in a
   separate table as shown above.

Re: Sorting

2011-03-10 Thread Brian Lamb

Any ideas on this one?

On Wed, Mar 9, 2011 at 2:00 PM, Brian Lamb brian.l...@journalexperts.comwrote:

 Hi all,

 I know that I can add sort=score desc to the url to sort in descending
 order. However, I would like to sort a MoreLikeThis response which returns
 records like this:

 lst name=moreLikeThis
   result name=3 numFound=113611 start=0 maxScore=0.4392774
   result name=2 numFound= start=0 maxScore=0.5392774
 /lst

 I don't want them grouped by result; I would just like have them all thrown
 together and then sorted according to score. I have an XSLT which does put
 them altogether and returns the following:

 moreLikeThis
   similar
 scorex./score
 idsome_id/id
   /similar
 /moreLikeThis

 However it appears that it basically applies the stylesheet to result
 name=3 then result name=2.

 How can I make it so that with my XSLT, the results appear sorted by
 score?

Re: dataimport

2011-03-09 Thread Brian Lamb

This has since been fixed. The problem was that there was not enough memory
on the machine. It works just fine now.

On Tue, Mar 8, 2011 at 6:22 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : INFO: Creating a connection for entity id with URL:
 :
 jdbc:mysql://localhost/researchsquare_beta_library?characterEncoding=UTF8zeroDateTimeBehavior=convertToNull
 : Feb 24, 2011 8:58:25 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1
 : call
 : INFO: Time taken for getConnection(): 137
 : Killed
 :
 : So it looks like for whatever reason, the server crashes trying to do a
 full
 : import. When I add a LIMIT clause on the query, it works fine when the
 LIMIT
 : is only 250 records but if I try to do 500 records, I get the same
 message.

 ...wow.  that's ... weird.

 I've never seen a java process just log Killed like that.

 The only time i've ever seen a process log Killed is if it was
 terminated by the os (ie: kill -9 pid)

 What OS are you using? how are you running solr? (ie: are you using the
 simple jetty example java -jar start.jar or are you using a differnet
 servlet container?) ... are you absolutely certain your machine doens't
 have some sort of monitoring in place that kills jobs if they take too
 long, or use too much CPU?


 -Hoss

Sorting

2011-03-09 Thread Brian Lamb

Hi all,

I know that I can add sort=score desc to the url to sort in descending
order. However, I would like to sort a MoreLikeThis response which returns
records like this:

lst name=moreLikeThis
  result name=3 numFound=113611 start=0 maxScore=0.4392774
  result name=2 numFound= start=0 maxScore=0.5392774
/lst

I don't want them grouped by result; I would just like have them all thrown
together and then sorted according to score. I have an XSLT which does put
them altogether and returns the following:

moreLikeThis
  similar
scorex./score
idsome_id/id
  /similar
/moreLikeThis

However it appears that it basically applies the stylesheet to result
name=3 then result name=2.

How can I make it so that with my XSLT, the results appear sorted by
score?

Re: docBoost

2011-03-09 Thread Brian Lamb

Anyone have any clue on this on?

On Tue, Mar 8, 2011 at 2:11 PM, Brian Lamb brian.l...@journalexperts.comwrote:

 Hi all,

 I am using dataimport to create my index and I want to use docBoost to
 assign some higher weights to certain docs. I understand the concept behind
 docBoost but I haven't been able to find an example anywhere that shows how
 to implement it. Assuming the following config file:

 document
entity name=animal
   dataSource=animals
   pk=id
   query=SELECT * FROM animals
 field column=id name=id /
 field column=genus name=genus /
 field column=species name=species /
 entity name=boosters
dataSource=boosts
query=SELECT boost_score FROM boosts WHERE animal_id = ${
 animal.id}
   field column=boost_score name=boost_score /
 /entity
   /entity
 /document

 How do I add in a docBoost score? The boost score is currently in a
 separate table as shown above.

Re: docBoost

2011-03-09 Thread Brian Lamb

That makes sense. As a follow up, is there a way to only conditionally use
the boost score? For example, in some cases I want to use the boost score
and in other cases I want all documents to be treated equally.

On Wed, Mar 9, 2011 at 2:42 PM, Jayendra Patil jayendra.patil@gmail.com
 wrote:

 you can use the ScriptTransformer to perform the boost calcualtion and
 addition.
 http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer

 dataConfig
script![CDATA[
function f1(row)  {
// Add boost
row.put('$docBoost',1.5);
return row;
}
]]/script
document
entity name=e pk=id transformer=script:f1
 query=select * from X

/entity
/document
 /dataConfig

 Regards,
 Jayendra


 On Wed, Mar 9, 2011 at 2:01 PM, Brian Lamb
 brian.l...@journalexperts.com wrote:
  Anyone have any clue on this on?
 
  On Tue, Mar 8, 2011 at 2:11 PM, Brian Lamb 
 brian.l...@journalexperts.comwrote:
 
  Hi all,
 
  I am using dataimport to create my index and I want to use docBoost to
  assign some higher weights to certain docs. I understand the concept
 behind
  docBoost but I haven't been able to find an example anywhere that shows
 how
  to implement it. Assuming the following config file:
 
  document
 entity name=animal
dataSource=animals
pk=id
query=SELECT * FROM animals
  field column=id name=id /
  field column=genus name=genus /
  field column=species name=species /
  entity name=boosters
 dataSource=boosts
 query=SELECT boost_score FROM boosts WHERE animal_id =
 ${
  animal.id}
field column=boost_score name=boost_score /
  /entity
/entity
  /document
 
  How do I add in a docBoost score? The boost score is currently in a
  separate table as shown above.

Excluding results from more like this

2011-03-09 Thread Brian Lamb

Hi all,

I'm using MoreLikeThis to find similar results but I'd like to exclude
records by the id number. For example, I use the following URL:

http://localhost:8983/solr/search/?q=id:(2 3
5)mlt=truemlt.fl=description,idfl=*,score

How would I exclude record 4 form the MoreLikeThis results?

I tried,

http://localhost:8983/solr/search/?q=id:(2 3
5)mlt=truemlt.fl=description,idfl=*,scoremlt.q=!4

But that still returned record 4 in the MoreLikeThisResults.

Re: Excluding results from more like this

2011-03-09 Thread Brian Lamb

That doesn't seem to do it. Record 4 is still showing up in the MoreLikeThis
results.

On Wed, Mar 9, 2011 at 4:12 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:

 Brian,

 ...?q=id:(2  3 5) -4


 Otis
 ---
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Brian Lamb brian.l...@journalexperts.com
  To: solr-user@lucene.apache.org
  Sent: Wed, March 9, 2011 4:05:10 PM
  Subject: Excluding results from more like this
 
  Hi all,
 
  I'm using MoreLikeThis to find similar results but I'd like to  exclude
  records by the id number. For example, I use the following  URL:
 
  http://localhost:8983/solr/search/?q=id:(2  3
  5)mlt=truemlt.fl=description,idfl=*,score
 
  How would I  exclude record 4 form the MoreLikeThis results?
 
  I tried,
 
  http://localhost:8983/solr/search/?q=id:(2  3
  5)mlt=truemlt.fl=description,idfl=*,scoremlt.q=!4
 
  But  that still returned record 4 in the MoreLikeThisResults.

docBoost

2011-03-08 Thread Brian Lamb

Hi all,

I am using dataimport to create my index and I want to use docBoost to
assign some higher weights to certain docs. I understand the concept behind
docBoost but I haven't been able to find an example anywhere that shows how
to implement it. Assuming the following config file:

document
   entity name=animal
  dataSource=animals
  pk=id
  query=SELECT * FROM animals
field column=id name=id /
field column=genus name=genus /
field column=species name=species /
entity name=boosters
   dataSource=boosts
   query=SELECT boost_score FROM boosts WHERE animal_id = ${
animal.id}
  field column=boost_score name=boost_score /
/entity
  /entity
/document

How do I add in a docBoost score? The boost score is currently in a separate
table as shown above.

Re: Indexed, but cannot search

2011-03-02 Thread Brian Lamb

Here are the relevant parts of schema.xml:

field name=globalField type=text indexed=true stored=true
multiValued=true/
defaultSearchFieldglobalField/defaultSearchField
copyField source=* dest=globalField /

This is what is returned when I search:

response
-
lst name=responseHeader
int name=status0/int
int name=QTime1/int
-
lst name=params
str name=qMammal/str
str name=debugQuerytrue/str
/lst
/lst
result name=response numFound=0 start=0 maxScore=0.0/
-
lst name=debug
str name=rawquerystringMammal/str
str name=querystringMammal/str
str name=parsedqueryglobalField:mammal/str
str name=parsedquery_toStringglobalField:mammal/str
lst name=explain/
str name=QParserLuceneQParser/str
-
lst name=timing
double name=time1.0/double
-
lst name=prepare
double name=time1.0/double
-
lst name=org.apache.solr.handler.component.QueryComponent
double name=time1.0/double
/lst
-
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
-
lst name=process
double name=time0.0/double
-
lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
-
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
/lst
/lst
/response

On Tue, Mar 1, 2011 at 7:57 PM, Markus Jelsma markus.jel...@openindex.iowrote:

 Hmm, please provide analyzer of text and output of debugQuery=true. Anyway,
 if
 field type is fieldType text and the catchall field text is fieldType text
 as well
 and you reindexed, it should work as expected.

  Oh if only it were that easy :-). I have reindexed since making that
 change
  which is how I was able to get the regular search working. I have not
  however been able to get the search across all fields to work.
 
  On Tue, Mar 1, 2011 at 3:01 PM, Markus Jelsma
 markus.jel...@openindex.iowrote:
   Traditionally, people forget to reindex ;)
  
Hi all,
   
The problem was that my fields were defined as type=string instead
 of
type=text. Once I corrected that, it seems to be fixed. The only
 part
that still is not working though is the search across all fields.
   
For example:
   
http://localhost:8983/solr/select/?q=type%3AMammal
   
Now correctly returns the records matching mammal. But if I try to do
 a
global search across all fields:
   
http://localhost:8983/solr/select/?q=Mammal
http://localhost:8983/solr/select/?q=text%3AMammal
   
I get no results returned. Here is how the schema is set up:
   
field name=text type=text indexed=true stored=false
multiValued=true/
defaultSearchFieldtext/defaultSearchField
copyField source=* dest=text /
   
Thanks to everyone for your help so far. I think this is the last
hurdle
  
   I
  
have to jump over.
   
On Tue, Mar 1, 2011 at 12:34 PM, Upayavira u...@odoko.co.uk wrote:
 Next question, do you have your type field set to index=true in
  
   your
  
 schema?

 Upayavira

 On Tue, 01 Mar 2011 11:06 -0500, Brian Lamb

 brian.l...@journalexperts.com wrote:
  Thank you for your reply but the searching is still not working
  out. For example, when I go to:
 
  http://localhost:8983/solr/select/?q=*%3A*
  
  
 http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10in
  
 dent=on

  I get the following as a response:
 
  result name=response numFound=249943 start=0
 
doc
 
  str name=typeMammal/str
  str name=id1/str
  str name=genusCanis/str
 
/doc
 
  /response
 
  (plus some other docs but one is enough for this example)
 
  But if I go to
  http://localhost:8983/solr/select/?q=type%3A
  
  
 http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10in
  
 dent=on

  Mammal
 
  I only get:
 
  result name=response numFound=0 start=0
 
  But it seems that should return at least the result I have listed
  above. What am I doing incorrectly?
 
  On Mon, Feb 28, 2011 at 6:57 PM, Upayavira u...@odoko.co.uk
 wrote:
   q=dog is equivalent to q=text:dog (where the default search
 field
  
   is
  
   defined as text at the bottom of schema.xml

Re: Indexed, but cannot search

2011-03-02 Thread Brian Lamb

So here's something interesting. I did a delta import this morning and it
looks like I can do a global search across those fields.

I'll do another full import and see if that fixed the problem. I had done a
fullimport after making this change but it seems like another reindex is in
order.

On Wed, Mar 2, 2011 at 10:31 AM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Please also provide analysis part of fieldType text. You can also use Luke
 to
 inspect the index.

 http://localhost:8983/solr/admin/luke?fl=globalFieldnumTerms=100

 On Wednesday 02 March 2011 16:09:33 Brian Lamb wrote:
  Here are the relevant parts of schema.xml:
 
  field name=globalField type=text indexed=true stored=true
  multiValued=true/
  defaultSearchFieldglobalField/defaultSearchField
  copyField source=* dest=globalField /
 
  This is what is returned when I search:
 
  response
  -
  lst name=responseHeader
  int name=status0/int
  int name=QTime1/int
  -
  lst name=params
  str name=qMammal/str
  str name=debugQuerytrue/str
  /lst
  /lst
  result name=response numFound=0 start=0 maxScore=0.0/
  -
  lst name=debug
  str name=rawquerystringMammal/str
  str name=querystringMammal/str
  str name=parsedqueryglobalField:mammal/str
  str name=parsedquery_toStringglobalField:mammal/str
  lst name=explain/
  str name=QParserLuceneQParser/str
  -
  lst name=timing
  double name=time1.0/double
  -
  lst name=prepare
  double name=time1.0/double
  -
  lst name=org.apache.solr.handler.component.QueryComponent
  double name=time1.0/double
  /lst
  -
  lst name=org.apache.solr.handler.component.FacetComponent
  double name=time0.0/double
  /lst
  -
  lst name=org.apache.solr.handler.component.MoreLikeThisComponent
  double name=time0.0/double
  /lst
  -
  lst name=org.apache.solr.handler.component.HighlightComponent
  double name=time0.0/double
  /lst
  -
  lst name=org.apache.solr.handler.component.StatsComponent
  double name=time0.0/double
  /lst
  -
  lst name=org.apache.solr.handler.component.DebugComponent
  double name=time0.0/double
  /lst
  /lst
  -
  lst name=process
  double name=time0.0/double
  -
  lst name=org.apache.solr.handler.component.QueryComponent
  double name=time0.0/double
  /lst
  -
  lst name=org.apache.solr.handler.component.FacetComponent
  double name=time0.0/double
  /lst
  -
  lst name=org.apache.solr.handler.component.MoreLikeThisComponent
  double name=time0.0/double
  /lst
  -
  lst name=org.apache.solr.handler.component.HighlightComponent
  double name=time0.0/double
  /lst
  -
  lst name=org.apache.solr.handler.component.StatsComponent
  double name=time0.0/double
  /lst
  -
  lst name=org.apache.solr.handler.component.DebugComponent
  double name=time0.0/double
  /lst
  /lst
  /lst
  /lst
  /response
 
  On Tue, Mar 1, 2011 at 7:57 PM, Markus Jelsma
 markus.jel...@openindex.iowrote:
   Hmm, please provide analyzer of text and output of debugQuery=true.
   Anyway, if
   field type is fieldType text and the catchall field text is fieldType
   text as well
   and you reindexed, it should work as expected.
  
Oh if only it were that easy :-). I have reindexed since making that
  
   change
  
which is how I was able to get the regular search working. I have not
however been able to get the search across all fields to work.
   
On Tue, Mar 1, 2011 at 3:01 PM, Markus Jelsma
  
   markus.jel...@openindex.iowrote:
 Traditionally, people forget to reindex ;)

  Hi all,
 
  The problem was that my fields were defined as type=string
  instead
  
   of
  
  type=text. Once I corrected that, it seems to be fixed. The
 only
  
   part
  
  that still is not working though is the search across all fields.
 
  For example:
 
  http://localhost:8983/solr/select/?q=type%3AMammal
 
  Now correctly returns the records matching mammal. But if I try
 to
  do
  
   a
  
  global search across all fields:
 
  http://localhost:8983/solr/select/?q=Mammal
  http://localhost:8983/solr/select/?q=text%3AMammal
 
  I get no results returned. Here is how the schema is set up:
 
  field name=text type=text indexed=true stored=false
  multiValued=true/
  defaultSearchFieldtext/defaultSearchField
  copyField source=* dest=text /
 
  Thanks to everyone for your help so far. I think this is the last
  hurdle

 I

  have to jump over.
 
  On Tue, Mar 1, 2011 at 12:34 PM, Upayavira u...@odoko.co.uk
 wrote:
   Next question, do you have your type field set to
 index=true
   in

 your

   schema?
  
   Upayavira
  
   On Tue, 01 Mar 2011 11:06 -0500, Brian Lamb
  
   brian.l...@journalexperts.com wrote:
Thank you for your reply but the searching is still not
 working
out. For example, when I go to:
   
http://localhost:8983/solr/select/?q=*%3A*
  
  
 http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows

Formatting the XML returned

2011-03-02 Thread Brian Lamb

Hi all,

This list has proven itself quite useful since I got started with Solr. I'm
wondering if it is possible to dictate the XML that is returned by a search?
Right now it seems very inefficient in that it is formatted like:

str name=field1Val/str
str name=field2Val/str

Etc.

I would like to change it so that it reads something like:

field1Val/field1
field2Val/field2

Is this possible? If so, how?

Thanks,

Brian Lamb

Re: Sub entities

2011-03-01 Thread Brian Lamb

Yes, it looks like I had left off the field (misspelled it actually). I
reran the full import and the fields did properly show up. However, it is
still not working as expected. Using the example below, a result returned
would only list one specie instead of a list of species. I have the
following in my schema.xml file:

field column=specie multiValued=true name=specie type=string
indexed=true stored=true required=false /

I reran the fullimport but it is still only listing one specie instead of
multiple. Is my above declaration incorrect?

On Tue, Mar 1, 2011 at 3:41 AM, Stefan Matheis 
matheis.ste...@googlemail.com wrote:

 Brian,

 except for your sql-syntax error in the specie_relations-query SELECT
 specie_id FROMspecie_relations .. (missing whitespace after FROM)
 your config looks okay.

 following questions:
 * is there a field named specie in your schema? (otherwise dih will
 silently ignore it)
 * did you check your mysql-query log? to see which queries were
 executed and what their result is?

 And, just as quick notice .. there is no need to use field
 column=foo name=foo (while both attribute have the same value).

 Regards
 Stefan

 On Mon, Feb 28, 2011 at 9:52 PM, Brian Lamb
 brian.l...@journalexperts.com wrote:
  Hi all,
 
  I was able to get my dataimport to work correctly but I'm a little
 unclear
  as to how the entity within an entity works in regards to search results.
  When I do a search for all results, it seems only the outermost responses
  are returned. For example, I have the following in my db config file:
 
  dataConfig
   dataSource type=JdbcDataSource name=mystuff batchSize=-1
  driver=com.mysql.jdbc.Driver
 
 url=jdbc:mysql://localhost/db?characterEncoding=UTF8amp;zeroDateTimeBehavior=convertToNull
  user=user password=password/
 document
   entity name=animal dataSource=mystuff query=SELECT * FROM
  animals
 field column=id name=id /
 field column=type name=type /
 field column=genus name=genus /
 
 !-- Add in the species --
 entity name=specie_relations dataSource=mystuff query=SELECT
   specie_id FROM specie_relations WHERE animal_id=${animal.id}
   entity name=species dataSource=mystuff query=SELECT specie
  FROM species WHERE id=${specie_relations.specie_id}
 field column=specie name=specie /
   /entity
 /entity
   /entity
 /document
   /dataSource
  /dataConfig
 
  However, specie never shows up in my search results:
 
  doc
   str name=typeMammal/str
   str name=id1/str
   str name=genusCanis/str
  /doc
 
  I had hoped the results would include the species. Can it? If so, what is
 my
  malfunction?

Re: Indexed, but cannot search

2011-03-01 Thread Brian Lamb

Thank you for your reply but the searching is still not working out. For
example, when I go to:

http://localhost:8983/solr/select/?q=*%3A*http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on

I get the following as a response:

result name=response numFound=249943 start=0
  doc
str name=typeMammal/str
str name=id1/str
str name=genusCanis/str
  /doc
/response

(plus some other docs but one is enough for this example)

But if I go to 
http://localhost:8983/solr/select/?q=type%3Ahttp://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on
Mammal

I only get:

result name=response numFound=0 start=0

But it seems that should return at least the result I have listed above.
What am I doing incorrectly?

On Mon, Feb 28, 2011 at 6:57 PM, Upayavira u...@odoko.co.uk wrote:

 q=dog is equivalent to q=text:dog (where the default search field is
 defined as text at the bottom of schema.xml).

 If you want to specify a different field, well, you need to tell it :-)

 Is that it?

 Upayavira

 On Mon, 28 Feb 2011 15:38 -0500, Brian Lamb
 brian.l...@journalexperts.com wrote:
  Hi all,
 
  I was able to get my installation of Solr indexed using dataimport.
  However,
  I cannot seem to get search working. I can verify that the data is there
  by
  going to:
 
 
 http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on
 
  This gives me the response: result name=response numFound=234961
  start=0
 
  But when I go to
 
 
 http://localhost:8983/solr/select/?q=dogversion=2.2start=0rows=10indent=on
 
  I get the response: result name=response numFound=0 start=0
 
  I know that dog should return some results because it is the first result
  when I select all the records. So what am I doing incorrectly that would
  prevent me from seeing results?
 
 ---
 Enterprise Search Consultant at Sourcesense UK,
 Making Sense of Open Source

Re: Sub entities

2011-03-01 Thread Brian Lamb

Thanks for the help Stefan. It seems removing column=specie fixed it.

On Tue, Mar 1, 2011 at 11:18 AM, Stefan Matheis 
matheis.ste...@googlemail.com wrote:

 Brian,

 On Tue, Mar 1, 2011 at 4:52 PM, Brian Lamb
 brian.l...@journalexperts.com wrote:
  field column=specie multiValued=true name=specie type=string
  indexed=true stored=true required=false /

 Not sure, but iirc field in this context has no column-Attribute ..
 that should normally not break your solr-configuration.

 Are you sure, that your animal has multiple species assigned? Checked
 the Query from the MySQL-Query-Log and verified that it returns more
 than one record?

 Otherwise you could enable
 http://wiki.apache.org/solr/DataImportHandler#LogTransformer for your
 dataimport, which outputs a log-row for every record .. just to
 ensure, that your Query-Results is correctly imported

 HTH, Regards
 Stefan

1 2 >

1 - 100 of 105 matches

Mail list logo