Re: Solr via ruby

2009-09-18 Thread Erik Hatcher


On Sep 18, 2009, at 1:09 AM, rajan chandi wrote:
We are planning to use the external Solr on tomcat for scalability  
reasons.


We thought that EmbeddedSolrServer uses HTTP too to talk with Ruby and
vise-versa as in acts_as_solr ruby plugin.


EmbeddedSolrServer is a way to run Solr as an API (like Lucene) rather  
than with any web container involved at all.  In other words, only  
Java can use EmbeddedSolrServer (which means JRuby works great).


The acts_as_solr plugin uses the solr-ruby library to communicate with  
Solr.  Under solr-ruby, it's HTTP with ruby (wt=ruby) formatted  
responses for searches, and documents being indexed get converted to  
Solr's XML format and POSTed to the Solr URL used to open the  
Solr::Connection


Erik




If Ruby is not using the HTTP to talk EmbeddedSolrServer, what is it  
using?


Thanks and Regards
Rajan Chandi

On Thu, Sep 17, 2009 at 9:44 PM, Erik Hatcher  
erik.hatc...@gmail.comwrote:




On Sep 17, 2009, at 11:40 AM, Ian Connor wrote:


Is there any support for connection pooling or a more optimized data
exchange format?



The solr-ruby library (as do other Solr + Ruby libraries) use the  
ruby
response format and eval it.  solr-ruby supports keeping the HTTP  
connection

alive too.

We are looking at any further ways to optimize the solr

queries so we can possibly make more of them in the one request.

The JSON like format seems pretty tight but I understand when the
distributed search takes place it uses a binary protocol instead  
of text.

I
wanted to know if that was available or could be available via the  
ruby

library.

Is it possible to host a local shard and skip HTTP between ruby  
and solr?




If you use JRuby you can do some fancy stuff, like use the javabin  
update
and response formats so no XML is involved, and you could also use  
Solr's
EmbeddedSolrServer to avoid HTTP.   However, in practice rarely is  
HTTP the
bottleneck and actually offers a lot of advantages, such as easy  
commodity

load balancing and caching.

But JRuby + Solr is a very beautiful way to go!

If you're using MRI Ruby, though, you don't really have any options  
other
than to go over HTTP. You could use json or ruby formatted  
responses - I'd

be curious to see some performance numbers comparing those two.

  Erik






Re: multicore shards and relevancy score

2009-09-18 Thread Erik Hatcher


On Sep 17, 2009, at 7:11 PM, Lance Norskog wrote:


 This looks like a Ruby client bug.


Maybe, but I doubt it in this case.

But let's have some details of the Ruby code used to make the request,  
and what gets logged on the first Solr for the request.


Erik




If you do the same query with the HTTP url, it should work.

On Tue, Sep 15, 2009 at 7:41 AM, Paul Rosen p...@performantsoftware.com 
 wrote:

Shalin Shekhar Mangar wrote:


On Tue, Sep 15, 2009 at 2:39 AM, Paul Rosen
p...@performantsoftware.comwrote:

I've done a few experiments with searching two cores with the  
same schema

using the shard syntax. (using solr 1.3)

My use case is that I want to have multiple cores because a few  
different
people will be managing the indexing, and that will happen at  
different

times. The data, however, is homogeneous.


Multiple cores were not built for distributed search. It is  
inefficient as
compared to a single index. But if you want to use them that way,  
that's

your choice.


Well, I'm experimenting with them because it will simplify index  
maintenance
greatly. I am beginning to think that it won't work in my case,  
though.




I've noticed in my tests that the results are not interwoven, but  
it

might
just be my test data. In other words, all the results from one core
appear,
then all the results from the other core.

In thinking about it, it would make sense if the relevancy scores  
for

each
core were completely independent of each other. And that would  
mean that

there is no way to compare the relevancy scores between the cores.

In other words, I'd like the following results:

- really relevant hit from core0
- pretty relevant hit from core1
- kind of relevant hit from core0
- not so relevant hit from core1

but I get:

- really relevant hit from core0
- kind of relevant hit from core0
- pretty relevant hit from core1
- not so relevant hit from core1

So, are the results supposed to be interwoven, and I need to  
study my

data
more, or is this just not something that is possible?



The only difference wrt relevancy between a distributed search and a
single-node search is that there is no distributed IDF and  
therefore a
distributed search assumes a random distribution of terms among  
shards.

I'm
not sure if that is what you are seeing.


Also, if this is insurmountable, I've discovered two show  
stoppers that
will prevent using multicore in my project (counting the lack of  
support

for
faceting in multicore). Are these issues addressed in solr 1.4?



Can you give more details on what these two issues are?



The first issue is detailed above, where the results from a search  
over two

shards don't appear to be returned in relevancy order.

The second issue was detailed in an email last week shards and facet
count. The facet information is lost when doing a search over two  
shards,

so if I use multicore, I can no longer have facets.







--
Lance Norskog
goks...@gmail.com




Exact word search in Solr

2009-09-18 Thread bhaskar chandrasekar
Hi,
 I am doing exact word search in Solr 1.3 and I am not getting the expected 
results.
I am giving you the sample XML file along with the mail from where search 
results are fetched.
The following steps were followed to achieve exact word search result in Solr.
 
1)  Schema.xml is configured for title, url and description
field name=url type=string indexed=true stored=true required=true/
field name=title type=text indexed=true stored=true required=true / 
field name=description type=text indexed=true stored=true 
required=true/
 
Commented below lines
  !--filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/--
!--filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/--
2) Started Solr server
3)  Indexed sample data with title, url  description 
5)  Assume I am giving (say channelone) as my input search string for exact 
word search in Solr admin page.
 
I am getting the following output.It sould show output pertaining to channelone 
only.It should not display combination of words with “channelone”.I am not 
looking for case sensitive search here.
  doc
  field name=urlhttp://c2search1/contactus3.html/field 
  field name=titlec2Search1: Contactus3/field 
  field name=descriptionchannelOne/field 
  /doc
- doc
  field name=urlhttp://c2search1/contactus4.html/field 
  field name=titlec2Search1: Contactus4/field 
  field name=descriptionChannelone/field 
  /doc
- doc
  field name=urlhttp://c2search1/contactus5.html/field 
  field name=titlec2Search1: Contactus5/field 
  field name=descriptionchannel...@$/field 
  /doc
- doc
  field name=urlhttp://c2search1/contactus6.html/field 
  field name=titlec2Search1: Contactus6/field 
  field name=descriptionchannelon...@$/field 
  /doc
- doc
  field name=urlhttp://c2search1/contactus7.html/field 
  field name=titlec2Search1: Contactus7/field 
  field name=descriptionchannelon...@$ab/field 
 /doc
 
 
Expected Result
  doc
  field name=urlhttp://c2search1/contactus3.html/field 
  field name=titlec2Search1: Contactus3/field 
  field name=descriptionchannelOne/field 
  /doc
- doc
  field name=urlhttp://c2search1/contactus4.html/field 
  field name=titlec2Search1: Contactus4/field 
  field name=descriptionChannelone/field 
  /doc
 
 
Please help me with the above scenario to achieve the desired output.
 
Regards
Bhaskar


  

Multicore Solr + Tomcat

2009-09-18 Thread René Hackl
Hello,

I have setup Tomcat 6 and Solr 1.3.0 and it works fine for single cores. Now I 
am trying to make it multicore and the cores don't seem to be recognized.

This works:

/solr/home/conf/schema.xml
/solr/home/conf/solrconfig.xml
/solr/home/data/

Clicking the admin link on the Welcome to Solr page brings up the familiar 
Solr admin page at http://localhost:8080/apache-solr-1.3.0/admin/ 

Changing the setup to multicore like this works not:

/solr/home/core1/conf/schema.xml
/solr/home/core1/conf/solrconfig.xml
/solr/home/core1/data/
/solr/home/core2/conf/schema.xml
/solr/home/core2/conf/solrconfig.xml
/solr/home/core2/data/
/solr/home/solr.xml

In solr.xml I do:

solr persistent=false sharedLib=lib
  cores adminPath=/admin/cores
core name=core1 instanceDir=core1 /
core name=core2 instanceDir=core2 /
  /cores
/solr

Clicking the admin link brings up an error message at 
http://localhost:8080/apache-solr-1.3.0/admin/

HTTP Status 404 - missing core name in path

The requested resource (missing core name in path) is not available.


Manually editing the URL to http://localhost:8080/apache-solr-1.3.0/admin/cores 
leads to 

HTTP Status 500 - Can not find a valid core for the cores admin handler 
java.lang.RuntimeException: Can not find a valid core for the cores admin 
handler at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:162)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) 
at org.apache.coyote.http11.Http11Processor.process(Http11
 Processor.java:845) at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at 
java.lang.Thread.run(Unknown Source) 

And adding core names to the path at one position or the other also brings up 
404 Errors.

Any hints on what to look for are greatly appreciated.

Thanks,
Rene
-- 
Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02


Question on omitNorms definition

2009-09-18 Thread Rahul R
Hello,
A rather trivial question on omitNorms parameter in schema.xml. The
out-of-the-box schema.xml uses this parameter during both within
the fieldType tag and field tag and  If we define the omitNorms during
the fieldType definition, will it hold good for all fields that are defined
using the same fieldType. For eg:

fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/
dynamicField name=* type=string indexed=true stored=true/

Now, will these dynamic fields have omitNorms=true for it ? I have read
about significant RAM usage when omitNorms is not set to true. Hence would
like to ensure that it is set to true for most of my fields.

Regards
Rahul


Re: shards and facet_count

2009-09-18 Thread Erik Hatcher


On Sep 17, 2009, at 6:14 PM, Lance Norskog wrote:

Yes. facet=false means don't do any faceting. This is why you don't
get any facet data back. This is probably a bug in the solr-ruby code.
Version number 0.0.x is probably a hint about its production-ready
status :)


Actually solr-ruby is plenty suitable for production use - it's pretty  
straightforward mapping from Rubyish stuff to Solr requests.  Not  
really much magic in there.  It's 0.0.x version number is really my  
own laziness (or swampedness) in getting it polished in a form I'd  
like (I'm a perfectionist with no time on his hands, a frustrating  
existence).  The RSolr API has some features I'd like to pull over  
into solr-ruby and do some refactoring one of these days in my copious  
free time, but in general solr-ruby works fine.


It is strange that you get facet=false calls in there, but maybe this  
is just normal distributed search protocol in one of the phases?


Erik


On Mon, Sep 14, 2009 at 6:46 AM, Paul Rosen p...@performantsoftware.com 
 wrote:

Shalin Shekhar Mangar wrote:


On Fri, Sep 11, 2009 at 2:35 AM, Paul Rosen
p...@performantsoftware.comwrote:


Hi again,

I've mostly gotten the multicore working except for one detail.

(I'm using solr 1.3 and solr-ruby 0.0.6 in a rails project.)

I've done a few queries and I appear to be able to get hits from  
either

core. (yeah!)

I'm forming my request like this:

req = Solr::Request::Standard.new(
 :start = start,
 :rows = max,
 :sort = sort_param,
 :query = query,
 :filter_queries = filter_queries,
 :field_list = @field_list,
 :facets = {:fields = @facet_fields, :mincount = 1, :missing  
= true,

:limit = -1},
 :highlighting = {:field_list = ['text'], :fragment_size = 600},
 :shards = @cores)

If I leave :shards = @cores out, then the response includes:

'facet_counts' = {
 'facet_dates' = {},
 'facet_queries' = {},
 'facet_fields' = { 'myfacet' = [ etc...], etc... }

which is what I expect.

If I add the :shards = @cores back in (so that I'm doing the  
exact

request above), I get:

'facet_counts' = {
 'facet_dates' = {},
 'facet_queries' = {},
 'facet_fields' = {}

so I've lost my facet information.

Why would it correctly find my documents, but not report the  
facet info?




I'm not a ruby guy but the response format in both the cases is  
exactly

the
same so I don't think there is any problem with the ruby client  
parsing.

Can
you check the Solr logs to see if there were any exceptions when  
you sent

the shards parameter?



I don't see any exceptions. The solr activity is pretty different  
for the
two cases. Without the shards, it makes one call that looks  
something like

this (I ellipsed the id and field parameters for clarity):

Sep 14, 2009 9:32:09 AM org.apache.solr.core.SolrCore execute
INFO: [resources] webapp=/solr path=/select
params 
= 
{facet 
.limit 
= 
-1 
wt 
= 
ruby 
rows 
= 
30 
start 
= 
0 
facet 
= 
true 
facet 
.mincount 
= 
1 
q 
= 
(rossetti 
)fl 
= 
archive 
,...,license 
qt 
= 
standard 
facet 
.missing 
= 
true 
hl 
.fl 
= 
text 
facet 
.field 
= 
genre 
facet.field=archivefacet.field=freeculturehl.fragsize=600hl=true}

hits=27 status=0 QTime=6

Note that facet=true.

With the shards, it has five lines for the single call that I make:

Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute
INFO: [exhibits] webapp=/solr path=/select
params 
= 
{wt 
= 
javabin 
rows 
= 
30 
start 
= 
0 
facet 
= 
true 
fl 
= 
uri 
,score 
q 
= 
(rossetti 
)version 
= 
2.2 
isShard 
= 
true 
facet 
.missing 
= 
true 
hl 
.fl 
= 
text 
fsv 
= 
true 
hl 
.fragsize 
= 
600 
facet 
.field=genrefacet.field=archivefacet.field=freeculturehl=false}

hits=6 status=0 QTime=0

Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute
INFO: [resources] webapp=/solr path=/select
params 
= 
{wt 
= 
javabin 
rows 
= 
30 
start 
= 
0 
facet 
= 
true 
fl 
= 
uri 
,score 
q 
= 
(rossetti 
)version 
= 
2.2 
isShard 
= 
true 
facet 
.missing 
= 
true 
hl 
.fl 
= 
text 
fsv 
= 
true 
hl 
.fragsize 
= 
600 
facet 
.field=genrefacet.field=archivefacet.field=freeculturehl=false}

hits=27 status=0 QTime=3

Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute
INFO: [resources] webapp=/solr path=/select
params={facet.limit=-1wt=javabinrows=30start=0ids=...,...facet=falsefacet.mincount=1q=(rossetti)fl=archive,...,uriversion=2.2facet.missing=trueisShard=truehl.fl=textfacet.field=genrefacet.field=archivefacet.field=freeculturehl.fragsize=600hl=true}
status=0 QTime=35

Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute
INFO: [exhibits] webapp=/solr path=/select
params={facet.limit=-1wt=javabinrows=30start=0ids=...,...facet=falsefacet.mincount=1q=(rossetti)fl=archive,...,uriversion=2.2facet.missing=trueisShard=truehl.fl=textfacet.field=genrefacet.field=archivefacet.field=freeculturehl.fragsize=600hl=true}
status=0 QTime=41

Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute
INFO: [resources] webapp=/solr path=/select

Re: Exact word search in Solr

2009-09-18 Thread AHMET ARSLAN
 Hi,
  I am doing exact word search in Solr 1.3 and I am not
 getting the expected results.
 I am giving you the sample XML file along with the mail
 from where search results are fetched.
 The following steps were followed to achieve exact word
 search result in Solr.

You can simply use the fieldType below to achieve this:

fieldType name=text_ws class=solr.TextField positionIncrementGap=100
 analyzer
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
/fieldType

Note that there is no WordDelimiterFilterFactory in this type. But probably 
yours has it.

Hope this helps.





Re: Searching with or without diacritics

2009-09-18 Thread AHMET ARSLAN
 Hi,
 
 Thanks for the suggestions, perhaps I am closer to the
 goal, but still don't
 get the result. I would like to find accented characters
 (mapped by the
 MappingCharFilterFactory) by writing unaccented queries. On
 this page:
 
 http://issues.ez.no/IssueView.php?Id=14742activeItem=2
 
 I've found that the MappCharFilter should be added to both
 the index and
 query type of analyzers I heard of these two types now
 for first. Is
 this the issue? I did not have so far any my analyzers
 marked with type index neither query.

Since it is not marked with type index neither query, it used for both.

Can you try this fieldType and give feedback: 

fieldtype class='solr.TextField' name='text' positionIncrementGap='100'
  analyzer
charFilter class=solr.MappingCharFilterFactory
  mapping=mapping-ISOLatin1Accent.txt/
  tokenizer class=solr.CharStreamAwareWhitespaceTokenizerFactory/
  filter class='solr.LowerCaseFilterFactory' /
  /analyzer
/fieldtype

Just to make sure: 

You are using latest nightly build of solr, right?

mapping-ISOLatin1Accent.txt file - under the conf directory - contains the 
character mappings that you want to replace?

Just FYI StandardFilter is meaningless without StandardTokenizer. So i removed 
it from you field type.

Hope this helps.


  


Re: Solr 1.3 deletes not working?

2009-09-18 Thread Lee Theobald

I also seem to be having a similar problem deleting.  As far as I can tell,
the system thinks we are deleting the records (it logs that it's executing
the commands and all looks OK) but the records always remain.  Regardsless
if we try a delete by ID or by query, nothing happens.  It's also not extra
characters in our deletion queries.

Can anyone think of anything else that I should be checking?  I'm sure it's
probably a small bit of config we've missed but I can't track it down.

Regards,
Lee
-- 
View this message in context: 
http://www.nabble.com/Solr-1.3-deletes-not-working--tp18124561p25506432.html
Sent from the Solr - User mailing list archive at Nabble.com.



acts_as_solr integeration with solr separately

2009-09-18 Thread abhay kumar
Hi,

I have setup solr search server in tomcat.

I am able to fire queries(of any knid)  get results in xml format.

Now i want to Integerate it(solr) with ruby on rails .

I know ruby on rails has inbuilt plugin acts_as_solr which helps in
integerating(talking) with solr.

acts_as_solr comes bundled with solr web application with jetty server.

But i don't wanna use this inbuilt solr web application .

e.g. i don't wanna do rake solr:start.

I am running solr as different search server in tomcat at port 8983.(url
http://localhost:8983/solr/  all other urls are listening)

Now, I want to talk to this solr server (separate) using acts_as_solr
plugin.

Questions:
1)Can anybody point me how to do this?
Any tutorial ?
2)What changes I had to make in acts_as_solr plugin?

3)Any good pointers(urls) will be appreciated...

Regards
Abhay


solr isnt using default field correctly

2009-09-18 Thread DHast

hi, 
if i do a search: text:law order~40
i get this:

str name=rawquerystringtext:law order~40/str
str name=querystringtext:law order~40/str
str name=parsedqueryPhraseQuery(text:law order~40)/str
str name=parsedquery_toStringtext:law order~40/str
str name=QParserOldLuceneQParser/str

However if i do: law order~40
i get this:

str name=rawquerystringlaw order~40/str
str name=querystringlaw order~40/str
str name=parsedquerytext:law order/str
str name=parsedquery_toStringtext:law order/str
lst name=explain/
str name=QParserOldLuceneQParser/str

my Schema xml:

 field name=text type=string indexed=true stored=false /
.
 defaultSearchFieldtext/defaultSearchField


what should i be doing differently to get the second results like the first?
-- 
View this message in context: 
http://www.nabble.com/solr-isnt-using-default-field-correctly-tp25507985p25507985.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr isnt using default field correctly

2009-09-18 Thread DHast

well it seems what is happening is solr is not being consistent,



DHast wrote:
 
 hi, 
 if i do a search: text:law order~40
 i get this:
 
 str name=rawquerystringtext:law order~40/str
 str name=querystringtext:law order~40/str
 str name=parsedqueryPhraseQuery(text:law order~40)/str
 str name=parsedquery_toStringtext:law order~40/str
 str name=QParserOldLuceneQParser/str
 
 However if i do: law order~40
 i get this:
 
 str name=rawquerystringlaw order~40/str
 str name=querystringlaw order~40/str
 str name=parsedquerytext:law order/str
 str name=parsedquery_toStringtext:law order/str
 lst name=explain/
 str name=QParserOldLuceneQParser/str
 
 my Schema xml:
 
  field name=text type=string indexed=true stored=false /
 .
  defaultSearchFieldtext/defaultSearchField
 
 
 what should i be doing differently to get the second results like the
 first?
 

-- 
View this message in context: 
http://www.nabble.com/solr-isnt-using-default-field-correctly-tp25507985p25508264.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr isnt using default field correctly

2009-09-18 Thread Erik Hatcher
I just tried this on trunk and both with and without a field selector  
it parses to a PhraseQuery.  I have trouble believing even Solr 1.3  
behaved like you reported, something seems fishy.


Erik

On Sep 18, 2009, at 9:02 AM, DHast wrote:



well it seems what is happening is solr is not being consistent,



DHast wrote:


hi,
if i do a search: text:law order~40
i get this:

str name=rawquerystringtext:law order~40/str
str name=querystringtext:law order~40/str
str name=parsedqueryPhraseQuery(text:law order~40)/str
str name=parsedquery_toStringtext:law order~40/str
str name=QParserOldLuceneQParser/str

However if i do: law order~40
i get this:

str name=rawquerystringlaw order~40/str
str name=querystringlaw order~40/str
str name=parsedquerytext:law order/str
str name=parsedquery_toStringtext:law order/str
lst name=explain/
str name=QParserOldLuceneQParser/str

my Schema xml:

field name=text type=string indexed=true stored=false /
.
defaultSearchFieldtext/defaultSearchField


what should i be doing differently to get the second results like the
first?



--
View this message in context: 
http://www.nabble.com/solr-isnt-using-default-field-correctly-tp25507985p25508264.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: solr isnt using default field correctly

2009-09-18 Thread DHast

yeah something is definitely strange, i think i know what it is though.
im going to make a separate post for it, but it cached the results from when
i had field:text as a string, 




Erik Hatcher-4 wrote:
 
 I just tried this on trunk and both with and without a field selector  
 it parses to a PhraseQuery.  I have trouble believing even Solr 1.3  
 behaved like you reported, something seems fishy.
 
   Erik
 
 On Sep 18, 2009, at 9:02 AM, DHast wrote:
 

 well it seems what is happening is solr is not being consistent,



 DHast wrote:

 hi,
 if i do a search: text:law order~40
 i get this:

 str name=rawquerystringtext:law order~40/str
 str name=querystringtext:law order~40/str
 str name=parsedqueryPhraseQuery(text:law order~40)/str
 str name=parsedquery_toStringtext:law order~40/str
 str name=QParserOldLuceneQParser/str

 However if i do: law order~40
 i get this:

 str name=rawquerystringlaw order~40/str
 str name=querystringlaw order~40/str
 str name=parsedquerytext:law order/str
 str name=parsedquery_toStringtext:law order/str
 lst name=explain/
 str name=QParserOldLuceneQParser/str

 my Schema xml:

 field name=text type=string indexed=true stored=false /
 .
 defaultSearchFieldtext/defaultSearchField


 what should i be doing differently to get the second results like the
 first?


 -- 
 View this message in context:
 http://www.nabble.com/solr-isnt-using-default-field-correctly-tp25507985p25508264.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-isnt-using-default-field-correctly-tp25507985p25508691.html
Sent from the Solr - User mailing list archive at Nabble.com.



want to features of a 'text' field with the non stemming of a 'string' field

2009-09-18 Thread DHast

when i have my fieldname: text set as a text field, advanced search queries
work very well, but when i have it set as a string it seems to ignore them,
like proximity searching and so on.
example: text as string:
str name=rawquerystringtext:law order~33/str
str name=querystringtext:law order~33/str
str name=parsedquerytext:law order/str
str name=parsedquery_toStringtext:law order/str

text as text:
str name=rawquerystringtext:law order~32
/str
str name=querystringtext:law order~32
/str
str name=parsedqueryPhraseQuery(text:law order~32)/str
str name=parsedquery_toStringtext:law order~32/str

however when i search a single term, it stems it if its text, example:
text as text:
str name=rawquerystringgoats
/str
str name=querystringgoats
/str
str name=parsedquerytext:goat/str
str name=parsedquery_toStringtext:goat/str

text as string:
str name=rawquerystringnuts
/str
str name=querystringnuts
/str
str name=parsedquerytext:nuts/str
str name=parsedquery_toStringtext:nuts/str
OR
str name=rawquerystringtext:goats
/str
str name=querystringtext:goats
/str
str name=parsedquerytext:goats/str
str name=parsedquery_toStringtext:goats/str


so what i want/need, is to STOP the stemming/plural killing that is
happening on the text field, 
ideas?

also, is tehre a way to wipe the cache while testing?




-- 
View this message in context: 
http://www.nabble.com/want-to-features-of-a-%27text%27-field-with-the-non-stemming-of-a-%27string%27-field-tp25508780p25508780.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 1.3 deletes not working?

2009-09-18 Thread Yonik Seeley
On Fri, Sep 18, 2009 at 6:26 AM, Lee Theobald l...@openobjects.com wrote:
 I also seem to be having a similar problem deleting.  As far as I can tell,
 the system thinks we are deleting the records (it logs that it's executing
 the commands and all looks OK) but the records always remain.  Regardsless
 if we try a delete by ID or by query, nothing happens.  It's also not extra
 characters in our deletion queries.

Did you issue a commit after the delete?  (you can also specify it
with the delete command)

-Yonik
http://www.lucidimagination.com

 Can anyone think of anything else that I should be checking?  I'm sure it's
 probably a small bit of config we've missed but I can't track it down.

 Regards,
 Lee


Re: Disabling tf (term frequency) during indexing and/or scoring

2009-09-18 Thread Aaron McKee


Hi Alexey,

Thank you for your suggestion! My understanding of Similarity, though, 
is that this would affect the entire index, whereas I need something 
that is field-configurable. Looking at Similarity.tf(), it seems to be 
independent of the field (and unaware of it). I don't necessarily want 
to disable tf entirely, as it'll likely be useful for other fulltext 
fields. Looking at more of the code, I'm guessing I'll need to get under 
the hood a fair bit more and possibly write a custom TermScorer and 
TermQuery.


I suppose I'm curious why the omitTfAndPositions option conflates two 
apparently independent features. It seems like it would have been 
entirely reasonable to treat these as separate options, as their use 
cases don't necessarily overlap. I suppose it was just the path of least 
resistance or the assumed common-case scenario.


Anyways, thanks again for your time.

Best regards,
Aaron

Alexey Serba wrote:

Hi Aaron,

You can overwrite default Lucene Similarity and disable tf and
lengthNorm factors in scoring formula ( see
http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Similarity.html
and http://lucene.apache.org/java/2_4_1/api/index.html )

You need to

1) compile the following class and put it into Solr WEB-INF/classes
---
package my.package;

import org.apache.lucene.search.DefaultSimilarity;

public class NoLengthNormAndTfSimilarity extends DefaultSimilarity {

public float lengthNorm(String fieldName, int numTerms) {
return numTerms  0 ? 1.0f : 0.0f;
}

public float tf(float freq) {
return freq  0 ? 1.0f : 0.0f;
}
}
---

2. Add similarity class=my.package.NoLengthNormAndTfSimilarity/
into your schema.xml
http://wiki.apache.org/solr/SchemaXml#head-e343cad75d2caa52ac6ec53d4cee8296946d70ca

HIH,
Alex

On Mon, Sep 14, 2009 at 9:50 PM, Aaron McKee ucbmc...@gmail.com wrote:
  

Hello,

Let me preface this by admitting that I'm still fairly new to Lucene and
Solr, so I apologize if any of this sounds naive and I'm open to thinking
about my problem differently.

I'm currently responsible for a rather large dataset of business records
that I'm trying to build a Lucene/Solr infrastructure around, to replace an
in-house solution that we've been using for a few years. These records are
sourced from multiple providers and there's often a fair bit of overlap in
the business coverage. I have a set of fuzzy correlation libraries that I
use to identify these documents and I ultimately create a super-record that
includes metadata from each of the providers. Given the nature of things,
these providers often have slight variations in wording or spelling in the
overlapping fields (it's amazing how many ways people find to refer to the
same business or address). I'd like to capture these variations, as they
facilitate searching, but TF considerations are currently borking field
scoring here.

For example, taking business names into consideration, I have a Solr schema
similar to:

field name=name_provider1 type=string indexed=false stored=false
multiValued=true
...
field name=name_providerN type=string indexed=false stored=false
multiValued=true
field name=nameNorm type=text indexed=true stored=false
multiValued=true omitNorms=true

copyField source=name_provider1 dest=nameNorm
...
copyField source=name_providerN dest=nameNorm

For any given business record, there may be 1..N business names present in
the nameNorm field (some with naming variations, some identical). With TF
enabled, however, I'm getting different match scores on this field simply
based on how many providers contributed to the record, which is not
meaningful to me. For example, a record containing nameNormfoo
barpositionIncrementGapfoo bar/nameNorm is necessarily scoring higher
than a record just containing nameNormfoo bar/nameNorm.  Although I
wouldn't mind TF data being considered within each discrete field value, I
need to find a way to prevent score inflation based simply on the number of
contributing providers.

Looking at the mailing list archive and searching around, it sounds like the
omitTf boolean in Lucene used to function somewhat in this manner, but has
since taken on a broader interpretation (and name) that now also disables
positional and payload data. Unfortunately, phrase support for fields like
this is absolutely essential. So what's the best way to address a need like
this? I guess I don't mind whether this is handled at index time or search
time, but I'm not sure what I may need to override or if there's some
existing provision I should take advantage of.

Thank you for any help you may have.

Best regards,
Aaron




Re: Disabling tf (term frequency) during indexing and/or scoring

2009-09-18 Thread Yonik Seeley
On Fri, Sep 18, 2009 at 9:38 AM, Aaron McKee ucbmc...@gmail.com wrote:
 I suppose I'm curious why the omitTfAndPositions option conflates two
 apparently independent features.

This relates to the index format, and is more for performance/size
benefits when they are not needed.  In the index, it's impossible to
omit the tf info and keep the position info (the frequency is the
number of positions).

-Yonik
http://www.lucidimagination.com


Re: want to features of a 'text' field with the non stemming of a 'string' field

2009-09-18 Thread Grant Ingersoll


On Sep 18, 2009, at 6:37 AM, DHast wrote:



when i have my fieldname: text set as a text field, advanced search  
queries
work very well, but when i have it set as a string it seems to  
ignore them,

like proximity searching and so on.
example: text as string:
str name=rawquerystringtext:law order~33/str
str name=querystringtext:law order~33/str
str name=parsedquerytext:law order/str
str name=parsedquery_toStringtext:law order/str

text as text:
str name=rawquerystringtext:law order~32
/str
str name=querystringtext:law order~32
/str
str name=parsedqueryPhraseQuery(text:law order~32)/str
str name=parsedquery_toStringtext:law order~32/str

however when i search a single term, it stems it if its text, example:
text as text:
str name=rawquerystringgoats
/str
str name=querystringgoats
/str
str name=parsedquerytext:goat/str
str name=parsedquery_toStringtext:goat/str

text as string:
str name=rawquerystringnuts
/str
str name=querystringnuts
/str
str name=parsedquerytext:nuts/str
str name=parsedquery_toStringtext:nuts/str
OR
str name=rawquerystringtext:goats
/str
str name=querystringtext:goats
/str
str name=parsedquerytext:goats/str
str name=parsedquery_toStringtext:goats/str


so what i want/need, is to STOP the stemming/plural killing that is
happening on the text field,
ideas?




It sounds like you need to dig into your schema.xml a bit more and set  
your analysis better.  See http://wiki.apache.org/solr/SchemaXml




also, is tehre a way to wipe the cache while testing?




--
View this message in context: 
http://www.nabble.com/want-to-features-of-a-%27text%27-field-with-the-non-stemming-of-a-%27string%27-field-tp25508780p25508780.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Question on omitNorms definition

2009-09-18 Thread Grant Ingersoll


On Sep 18, 2009, at 2:45 AM, Rahul R wrote:


Hello,
A rather trivial question on omitNorms parameter in schema.xml. The
out-of-the-box schema.xml uses this parameter during both within
the fieldType tag and field tag and  If we define the omitNorms  
during
the fieldType definition, will it hold good for all fields that are  
defined

using the same fieldType. For eg:

fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/
dynamicField name=* type=string indexed=true stored=true/

Now, will these dynamic fields have omitNorms=true for it ? I have  
read
about significant RAM usage when omitNorms is not set to true. Hence  
would

like to ensure that it is set to true for most of my fields.



Yes, it will hold be set for all fields for that field type

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: want to features of a 'text' field with the non stemming of a 'string' field

2009-09-18 Thread DHast

i have looked, and seem to be running into a dead end every time i try it,
but again it may be because of the caching and me not realizing it was doing
it till my hair was half pulled.

i dont suppose youd be willing to give a hint then?



Grant Ingersoll-6 wrote:
 
 
 On Sep 18, 2009, at 6:37 AM, DHast wrote:
 

 when i have my fieldname: text set as a text field, advanced search  
 queries
 work very well, but when i have it set as a string it seems to  
 ignore them,
 like proximity searching and so on.
 example: text as string:
 str name=rawquerystringtext:law order~33/str
 str name=querystringtext:law order~33/str
 str name=parsedquerytext:law order/str
 str name=parsedquery_toStringtext:law order/str

 text as text:
 str name=rawquerystringtext:law order~32
 /str
 str name=querystringtext:law order~32
 /str
 str name=parsedqueryPhraseQuery(text:law order~32)/str
 str name=parsedquery_toStringtext:law order~32/str

 however when i search a single term, it stems it if its text, example:
 text as text:
 str name=rawquerystringgoats
 /str
 str name=querystringgoats
 /str
 str name=parsedquerytext:goat/str
 str name=parsedquery_toStringtext:goat/str

 text as string:
 str name=rawquerystringnuts
 /str
 str name=querystringnuts
 /str
 str name=parsedquerytext:nuts/str
 str name=parsedquery_toStringtext:nuts/str
 OR
 str name=rawquerystringtext:goats
 /str
 str name=querystringtext:goats
 /str
 str name=parsedquerytext:goats/str
 str name=parsedquery_toStringtext:goats/str


 so what i want/need, is to STOP the stemming/plural killing that is
 happening on the text field,
 ideas?

 
 
 It sounds like you need to dig into your schema.xml a bit more and set  
 your analysis better.  See http://wiki.apache.org/solr/SchemaXml
 
 
 also, is tehre a way to wipe the cache while testing?




 -- 
 View this message in context:
 http://www.nabble.com/want-to-features-of-a-%27text%27-field-with-the-non-stemming-of-a-%27string%27-field-tp25508780p25508780.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
 using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 

-- 
View this message in context: 
http://www.nabble.com/want-to-features-of-a-%27text%27-field-with-the-non-stemming-of-a-%27string%27-field-tp25508780p25509828.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: want to features of a 'text' field with the non stemming of a 'string' field

2009-09-18 Thread DHast

ok, used the built in fieldtype text_ws that seems to go well

DHast wrote:
 
 i have looked, and seem to be running into a dead end every time i try it,
 but again it may be because of the caching and me not realizing it was
 doing it till my hair was half pulled.
 
 i dont suppose youd be willing to give a hint then?
 
 
 
 Grant Ingersoll-6 wrote:
 
 
 On Sep 18, 2009, at 6:37 AM, DHast wrote:
 

 when i have my fieldname: text set as a text field, advanced search  
 queries
 work very well, but when i have it set as a string it seems to  
 ignore them,
 like proximity searching and so on.
 example: text as string:
 str name=rawquerystringtext:law order~33/str
 str name=querystringtext:law order~33/str
 str name=parsedquerytext:law order/str
 str name=parsedquery_toStringtext:law order/str

 text as text:
 str name=rawquerystringtext:law order~32
 /str
 str name=querystringtext:law order~32
 /str
 str name=parsedqueryPhraseQuery(text:law order~32)/str
 str name=parsedquery_toStringtext:law order~32/str

 however when i search a single term, it stems it if its text, example:
 text as text:
 str name=rawquerystringgoats
 /str
 str name=querystringgoats
 /str
 str name=parsedquerytext:goat/str
 str name=parsedquery_toStringtext:goat/str

 text as string:
 str name=rawquerystringnuts
 /str
 str name=querystringnuts
 /str
 str name=parsedquerytext:nuts/str
 str name=parsedquery_toStringtext:nuts/str
 OR
 str name=rawquerystringtext:goats
 /str
 str name=querystringtext:goats
 /str
 str name=parsedquerytext:goats/str
 str name=parsedquery_toStringtext:goats/str


 so what i want/need, is to STOP the stemming/plural killing that is
 happening on the text field,
 ideas?

 
 
 It sounds like you need to dig into your schema.xml a bit more and set  
 your analysis better.  See http://wiki.apache.org/solr/SchemaXml
 
 
 also, is tehre a way to wipe the cache while testing?




 -- 
 View this message in context:
 http://www.nabble.com/want-to-features-of-a-%27text%27-field-with-the-non-stemming-of-a-%27string%27-field-tp25508780p25508780.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
 using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/want-to-features-of-a-%27text%27-field-with-the-non-stemming-of-a-%27string%27-field-tp25508780p25510178.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Disabling tf (term frequency) during indexing and/or scoring

2009-09-18 Thread Walter Underwood
Though it would be possible to calculate a binary tf, where the score  
is 1 if there are one or more occurances of the term. --wunder


On Sep 18, 2009, at 7:08 AM, Yonik Seeley wrote:

On Fri, Sep 18, 2009 at 9:38 AM, Aaron McKee ucbmc...@gmail.com  
wrote:

I suppose I'm curious why the omitTfAndPositions option conflates two
apparently independent features.


This relates to the index format, and is more for performance/size
benefits when they are not needed.  In the index, it's impossible to
omit the tf info and keep the position info (the frequency is the
number of positions).

-Yonik
http://www.lucidimagination.com





Re: Disabling tf (term frequency) during indexing and/or scoring

2009-09-18 Thread Aaron McKee


Hi Yonik,

Thank you for the explanation. If the primary goal was to save index 
space for a very specific subclass of fields, the implementation 
certainly makes more sense. I wonder, though, if it could also make 
sense to support a query-time only boolean to optionally disable TF 
independently, on a per-field basis? Or, perhaps (and this may be 
demonstrating my naivete), allowing Similarity to be overridden on a 
per-field basis? I imagine it could make scoring even more confusing 
than it sometimes already is, though. It's an atrocious hack on my part, 
but I largely seem to have achieved my tf goals in this manner; I 
overrode the getSimilarity methods in PhraseQuery and TermQuery to 
return a fixed-tf Similarity implementation if the field value is in the 
set of those I care about. From the looks of it, though, generalizing 
the change into anything other than a hack would touch a rather large 
number of code points.


Best regards,
Aaron


Yonik Seeley wrote:

On Fri, Sep 18, 2009 at 9:38 AM, Aaron McKee ucbmc...@gmail.com wrote:
  

I suppose I'm curious why the omitTfAndPositions option conflates two
apparently independent features.



This relates to the index format, and is more for performance/size
benefits when they are not needed.  In the index, it's impossible to
omit the tf info and keep the position info (the frequency is the
number of positions).

-Yonik
http://www.lucidimagination.com
  


Re: How to leverage the LogMergePolicy calibrateSizeByDeletes patch in Solr ?

2009-09-18 Thread Yonik Seeley
On Thu, Sep 17, 2009 at 4:30 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 I was wondering if there is a way I can modify calibrateSizeByDeletes just
 by configuration ?


 Alas, no. The only option that I see for you is to sub-class
 LogByteSizeMergePolicy and set calibrateSizeByDeletes to true in the
 constructor. However, please open a Jira issue and so we don't forget about
 it.

It's the continuing stuff like this that makes me feel like we should
be Spring (or equivalent) based someday... I'm just not sure how we're
going to get there.

-Yonik
http://www.lucidimagination.com


Re: Latest trunk locks execution thread in SolrCore.getSearcher()

2009-09-18 Thread Grant Ingersoll
Also, do you have any custom components or anything that implements  
SolrInfoMBean?


On Sep 18, 2009, at 8:16 AM, Grant Ingersoll wrote:

Can you try the patch I just put up on https://issues.apache.org/jira/browse/SOLR-1427 
 and let me know if it works when JMX is enabled?


Also, do you have warming queries setup?

On Sep 17, 2009, at 12:46 PM, Chris Harris wrote:


It looks like this works as a fix for me as well. (I'm not currently
using JMX for anything anyway.)

Curiously, the single-core example solrconfig.xml also has jmx /,
but it doesn't seem to be a problem there.

2009/9/17 Dadasheva, Olga olga_dadash...@harvard.edu:

Hi,

FWIW: disabling jmx/ fixed this problem for me.

Thanks you!

-Olga

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of  
Yonik Seeley

Sent: Thursday, September 17, 2009 1:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Latest trunk locks execution thread in  
SolrCore.getSearcher()


Interesting... I still haven't been able to reproduce a hang with  
either jetty or tomcat.

I enabled replication and JMX... still nothing.

-Yonik
http://www.lucidimagination.com


On Thu, Sep 17, 2009 at 12:35 PM, Chris Harris rygu...@gmail.com  
wrote:
I found what looks like the same issue when I tried to install  
r815830
under Tomcat. (It works ok with the normal Jetty example/ 
start.jar.) I
haven't checked the stack trace, but Tomcat would hang right  
after the

message

INFO: Adding  debug
component:org.apache.solr.handler.component.debugcompon...@1904e0d

showed up in the log.

I have a little more evidence about Yonik's theory that SOLR-1427  
is
part of the cause. In particular, when I reverse-merged r815587  
(the

commit for SOLR-1427) into (out of?) my r815830-based working copy,
then Tomcat was able to load Solr normally.

2009/9/16 Yonik Seeley yo...@lucidimagination.com:
On a quick look, it looks like this was caused (or at least  
triggered

by)
https://issues.apache.org/jira/browse/SOLR-1427

Registering the bean in the SolrCore constructor causes it to
immediately turn around and ask for the stats which asks for a
searcher, which blocks.

-Yonik
http://www.lucidimagination.com

On Wed, Sep 16, 2009 at 9:34 PM, Dadasheva, Olga
olga_dadash...@harvard.edu wrote:

Hi,

I am  testing EmbeddedSolrServer vs StreamingUpdateSolrServer   
for
my crawlers using more or less recent Solr code and everything  
was

fine till today when I took the latest trunk code.
When I start my crawler I see a number of INFO outputs
2009-09-16 21:08:29,399 INFO  Adding
component:org.apache.solr.handler.component.HighlightComponent 
@36ae8

3
(SearchHandler.java:132) - [main]
2009-09-16 21:08:29,400 INFO  Adding
component:org.apache.solr.handler.component.StatsComponent 
@1fb24d3

(SearchHandler.java:132) - [main]
2009-09-16 21:08:29,401 INFO  Adding
component:org.apache.solr.handler.component.TermVectorComponent 
@14ba

9a2
(SearchHandler.java:132) - [main]
2009-09-16 21:08:29,402 INFO  Adding  debug
component:org.apache.solr.handler.component.DebugComponent 
@12ea1dd

(SearchHandler.java:137) - [main]

and then the log/program stops.






--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Disabling tf (term frequency) during indexing and/or scoring

2009-09-18 Thread Aaron McKee

Hi Yonik,

For my particular needs, IDF considerations are fine and helpful; if a 
user is requesting a rare term/phrase, increasing the score based on 
that makes sense as the match has higher confidence. I simply need to 
compensate for title and category type fields that may contain redundant 
information and disregard length considerations (these fields are 
multi-valued and may be populated from a varying number of sources, and 
I don't want the number of sources and the level of repetitiveness to 
affect the score). Basically, a boolean does it match score adjusted 
solely based on IDF. Of course, I'm sure there are others who probably 
wouldn't need or care about IDF, either, but still want phrase matching.


Cheers,
Aaron


Yonik Seeley wrote:

On Fri, Sep 18, 2009 at 11:05 AM, Aaron McKee ucbmc...@gmail.com wrote:
  

I wonder, though, if it could also make sense to support a
query-time only boolean to optionally disable TF independently, on a
per-field basis?



I guess it could make sense.  But do you still want idf too? length
norm? or do you really want a constant score (match/no-match)?

-Yonik
http://www.lucidimagination.com
  


Re: Latest trunk locks execution thread in SolrCore.getSearcher()

2009-09-18 Thread Chris Harris
No, I'm pretty sure nothing implements SolrInfoMBean.

I applied the new 1K version of SOLR-1427.patch from

https://issues.apache.org/jira/browse/SOLR-1427

(which appears to be a secondary patch, to be applied once the main
SOLR-1427 patch has already been applied) to my problematic Solr
instance, which is based on Solr SVN r815830. This patch did not seem
to solve the hang problem; once I reenabled JMX, then the process
would hang at the same spot, i.e. right after

INFO: Adding  debug
component:org.apache.solr.handler.component.debugcompon...@1d7b222

appeared in the Tomcat log.

When Solr/Tomcat are hung, there are two Solr-related threads that
show up in a thread dump. I'll paste those stack traces below:

pool-1-thread-1 prio=6 tid=0x0b1ef800 nid=0xdc8 waiting on condition
[0x0b68f000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x035f3e60 (a
java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown
Source)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(Unknown
Source)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(Unknown
Source)
at java.util.concurrent.CountDownLatch.await(Unknown Source)
at org.apache.solr.core.SolrCore$1.call(SolrCore.java:559)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Thread-1 prio=6 tid=0x00c92c00 nid=0xf14 in Object.wait() [0x0b19e000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x035d2ab0 (a java.lang.Object)
at java.lang.Object.wait(Object.java:485)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:994)
- locked 0x035d2ab0 (a java.lang.Object)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:907)
at 
org.apache.solr.handler.ReplicationHandler.getIndexVersion(ReplicationHandler.java:472)
at 
org.apache.solr.handler.ReplicationHandler.getStatistics(ReplicationHandler.java:490)
at 
org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMonitoredMap.java:224)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(Unknown
Source)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(Unknown
Source)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(Unknown Source)
at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:137)
at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)
at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:446)
at org.apache.solr.core.SolrCore.init(SolrCore.java:578)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
snip

2009/9/18 Grant Ingersoll gsing...@apache.org:
 Also, do you have any custom components or anything that implements
 SolrInfoMBean?

 On Sep 18, 2009, at 8:16 AM, Grant Ingersoll wrote:

 Can you try the patch I just put up on
 https://issues.apache.org/jira/browse/SOLR-1427 and let me know if it works
 when JMX is enabled?

 Also, do you have warming queries setup?

 On Sep 17, 2009, at 12:46 PM, Chris Harris wrote:

 It looks like this works as a fix for me as well. (I'm not currently
 using JMX for anything anyway.)

 Curiously, the single-core example solrconfig.xml also has jmx /,
 but it doesn't seem to be a problem there.

 2009/9/17 Dadasheva, Olga olga_dadash...@harvard.edu:

 Hi,

 FWIW: disabling jmx/ fixed this problem for me.

 Thanks you!

 -Olga

 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
 Seeley
 Sent: Thursday, September 17, 2009 1:09 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Latest trunk locks execution thread in
 SolrCore.getSearcher()

 Interesting... I still haven't been able to reproduce a hang with either
 jetty or tomcat.
 I enabled replication and JMX... still nothing.

 -Yonik
 http://www.lucidimagination.com


 On Thu, Sep 17, 2009 at 12:35 PM, Chris Harris rygu...@gmail.com
 wrote:

 I found what looks like the same 

Re: Latest trunk locks execution thread in SolrCore.getSearcher()

2009-09-18 Thread Chris Harris
Forgot to answer this one. Yes, I do have a warming query to get the
sort caches up to speed. I think it takes a while to run; my guess
would be 30 seconds or so.

2009/9/18 Grant Ingersoll gsing...@apache.org:

 Also, do you have warming queries setup?

 On Sep 17, 2009, at 12:46 PM, Chris Harris wrote:

 It looks like this works as a fix for me as well. (I'm not currently
 using JMX for anything anyway.)

 Curiously, the single-core example solrconfig.xml also has jmx /,
 but it doesn't seem to be a problem there.

 2009/9/17 Dadasheva, Olga olga_dadash...@harvard.edu:

 Hi,

 FWIW: disabling jmx/ fixed this problem for me.


Re: copyfield at search time?

2009-09-18 Thread Chris Harris
If the reason you're copying from member_of to member_of_facet is
because faceting isn't allowed on multi-valued fields, then that's no
longer true. See

https://issues.apache.org/jira/browse/SOLR-475

which is in the trunk and which will be available in the 1.4 release.
If you're running an earlier version of Solr, maybe something like
this is necessary. (If multi-valued faceting is possible at all in
earlier versions.)

In any case, I'm not sure why the title of your message is copyfield
at search time. Copyfield stuff happens at indexing time, not search
time. So if your approach is going to work, you're going to need to
reindex for it to take effect.

2009/9/17 DHast hastings.recurs...@gmail.com:

 is it possible to do somehting like this:

 Now im wondering how to do something like this:

 field name=member_of_facet type=string indexed=true stored=true
 multiValued=true/

 field name=member_of type=string indexed=true stored=true
 multiValued=false/


  copyField source=member_of dest=member_of_facet /


 if so, i dont seem to be making progress
 thanks
 --
 View this message in context: 
 http://www.nabble.com/copyfield-at-search-time--tp25491979p25491979.html
 Sent from the Solr - User mailing list archive at Nabble.com.




[Job] Solr Search Opportunity - Direct Hire, Not a Recruiter

2009-09-18 Thread Bennett Smith

Hello,

The company I work for is looking to hire a Sr. Software Engineer with  
considerable experience using Solr.  The project we are embarking on  
is relatively new so the person we hire would have a lot of freedom to  
help define the architecture for our e-commerce product and merchant  
indexing and search services.


Below is a copy of the job description. If you are interested please  
send an e-mail directly to me. I am the hiring manager using the e- 
mail address bsmith at auctiva dot com.


Many thanks

Bennett Smith
Director of Software Engineering
Auctiva Corporation

Auctiva is building an open e-Commerce platform that will power new  
ways to connect buyers and sellers. We have a long history of building  
e-Commerce tools to help buyers and sellers connect in the eBay  
marketplace. Our success is due in large part to the success of our  
customers. We take immense pride in listening to our customers and  
developing best-in-class applications that address their unique needs.


Auctiva is seeking a Senior Software Engineer to join the Platform  
Search  Indexing Services team in our expanding San Jose office. This  
team is responsible for e-Commerce content search and indexing  
services running on a combination of Windows and Linux platforms.  
Applicants must have significant software development experience on  
both platforms.


The candidate must have solid development skills, the ability to  
properly analyze a problem, and good written and verbal communication  
skills. The candidate must also be a self-starter and ready to hit  
the ground running. The candidate must work well both in a small team  
environment and on their own. The candidate must have the ability to  
multi-task and handle dynamic requirements.


Education / Experience
BS or MS in Computer Science or equivalent work experience
7+ years experience in a similar position
Extensive hands-on development experience
Experience building large scale server applications and systems
Required Skills / Abilities
4+ years OO programming  design using Java and/or C#
4+ years experience with Design Patterns
2+ years experience developing with Solr/Lucene
2+ years server side Linux development in Java
2+ years server side Windows development in C# and .NET
3+ years experience in multi-threaded application development
Unit test development using JUnit and/or NUnit
Proven ability to work through a full development life-cycle from  
requirements analysis to deployment.

Familiarity with Agile practices such as Scrum or Extreme Programming
Desired:
Experience developing hetrogenous (Linux,Java / Windows,C#) applications
Linux System Administration Experience
Experience running Apache/Tomcat in a production environment
Familiarity with REST Web Service development
Familiarity with SOAP and RPC protocols
A background in network socket communications
Prior C/C++ development experience



smime.p7s
Description: S/MIME cryptographic signature


Re: How to leverage the LogMergePolicy calibrateSizeByDeletes patch in Solr ?

2009-09-18 Thread Jason Rutherglen
Over the weekend I may write a patch to allow simple reflection based
injection from within solrconfig.

On Fri, Sep 18, 2009 at 8:10 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Thu, Sep 17, 2009 at 4:30 PM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
 I was wondering if there is a way I can modify calibrateSizeByDeletes just
 by configuration ?


 Alas, no. The only option that I see for you is to sub-class
 LogByteSizeMergePolicy and set calibrateSizeByDeletes to true in the
 constructor. However, please open a Jira issue and so we don't forget about
 it.

 It's the continuing stuff like this that makes me feel like we should
 be Spring (or equivalent) based someday... I'm just not sure how we're
 going to get there.

 -Yonik
 http://www.lucidimagination.com



RE: Disabling tf (term frequency) during indexing and/or scoring

2009-09-18 Thread Walter Underwood
Constant tf with idf can work well for very short fields, like titles. For
example, the movie New York, New York is not twice as much about New York
as movies that have the string in the title only once.

wudner

-Original Message-
From: Aaron McKee [mailto:ucbmc...@gmail.com] 
Sent: Friday, September 18, 2009 8:33 AM
To: solr-user@lucene.apache.org
Subject: Re: Disabling tf (term frequency) during indexing and/or scoring

Hi Yonik,

For my particular needs, IDF considerations are fine and helpful; if a 
user is requesting a rare term/phrase, increasing the score based on 
that makes sense as the match has higher confidence. I simply need to 
compensate for title and category type fields that may contain redundant 
information and disregard length considerations (these fields are 
multi-valued and may be populated from a varying number of sources, and 
I don't want the number of sources and the level of repetitiveness to 
affect the score). Basically, a boolean does it match score adjusted 
solely based on IDF. Of course, I'm sure there are others who probably 
wouldn't need or care about IDF, either, but still want phrase matching.

Cheers,
Aaron


Yonik Seeley wrote:
 On Fri, Sep 18, 2009 at 11:05 AM, Aaron McKee ucbmc...@gmail.com wrote:
   
 I wonder, though, if it could also make sense to support a
 query-time only boolean to optionally disable TF independently, on a
 per-field basis?
 

 I guess it could make sense.  But do you still want idf too? length
 norm? or do you really want a constant score (match/no-match)?

 -Yonik
 http://www.lucidimagination.com
   




Free Webinar - Apache Lucene 2.9: Technical Overview of New Features

2009-09-18 Thread Erik Hatcher

Free Webinar: Apache Lucene 2.9: Discover the Powerful New Features
---

Join us for a free and in-depth technical webinar with Grant  
Ingersoll, co-founder of Lucid Imagination and chair of the Apache  
Lucene PMC.


Thursday, September 24th 2009
11:00AM - 12 NOON PDT / 2:00 - 3:00PM EDT
Click on the link below to sign up
http://www.eventsvc.com/lucidimagination/092409?trk=WR-SEP2009B-AP

Lucene 2.9 offers a rich set of new features and performance  
improvements alongside plentiful fixes and optimizations. If you are a  
Java developer building search applications with the Lucene search  
library, this webinar provides the insights you need to harness this  
important update to Apache Lucene.


Grant will present and discuss key technical features and innovations  
including:

o Real time/Per segment searching and caching
o Built in numeric range support with trie structure for speed and  
simplified programming

o Reduced search latency and improved index efficiency

Join us for a free webinar.
Thursday, September 24th 2009
11:00 AM - NOON PDT / 2:00 - 3:00 PM EDT
http://www.eventsvc.com/lucidimagination/092409?trk=WR-SEP2009B-AP


Re: shards and facet_count

2009-09-18 Thread Yonik Seeley
On Fri, Sep 18, 2009 at 5:58 AM, Erik Hatcher erik.hatc...@gmail.com wrote:
 It is strange that you get facet=false calls in there, but maybe this is
 just normal distributed search protocol in one of the phases?

Right, on the second phase of a distrib request, additional faceting
may not be needed.

But it looks like the distributed request is being directed at two
different handlers rather than two different servers or cores?
shards=localhost:8983/solr/resources,localhost:8983/solr/exhibits

I've never tried this, but from the log file, it doesn't look like the
sub-requests are going to those different handlers since the path is
always path=/select

-Yonik
http://www.lucidimagination.com


Re: How to leverage the LogMergePolicy calibrateSizeByDeletes patch in Solr ?

2009-09-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
We can use a simple reflection based implementation to simplify
reading too many parameters.

What I wish to emphasize is that Solr should be agnostic of xml
altogether. It should only be aware of specific Objects and
interfaces. If users wish to plugin something else in some other way ,
it should be fine


 There is a huge learning involved in learning the current
solrconfig.xml . Let us not make people throw away that .

On Sat, Sep 19, 2009 at 1:59 AM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 Over the weekend I may write a patch to allow simple reflection based
 injection from within solrconfig.

 On Fri, Sep 18, 2009 at 8:10 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Thu, Sep 17, 2009 at 4:30 PM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
 I was wondering if there is a way I can modify calibrateSizeByDeletes just
 by configuration ?


 Alas, no. The only option that I see for you is to sub-class
 LogByteSizeMergePolicy and set calibrateSizeByDeletes to true in the
 constructor. However, please open a Jira issue and so we don't forget about
 it.

 It's the continuing stuff like this that makes me feel like we should
 be Spring (or equivalent) based someday... I'm just not sure how we're
 going to get there.

 -Yonik
 http://www.lucidimagination.com





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com