MoreLikeThis function queries

2010-04-02 Thread Blargy
Are function queries possible using the MLT request handler? How about using the _val_ hack? Thanks for your help -- View this message in context: http://n3.nabble.com/MoreLikeThis-function-queries-tp692377p692377.html Sent from the Solr - User mailing list archive at Nabble.com.

cheking the size of the index using solrj API's

2010-04-02 Thread Na_D
hi, I need to monitor the index for the following information: 1. Size of the index 2 Last time the index was updated. Although I did an extensive search of the API's i cant find something which does the same( as mentioned above) please help -- View this message in context:

Re: cheking the size of the index using solrj API's

2010-04-02 Thread Ahmet Arslan
I need to monitor the index for the following information: 1. Size of the index 2 Last time the index was updated. Although I did an extensive search of the API's i cant find something which does the same( as mentioned above) solr/admin/stats.jsp is actually an xml converted to html

Experience with indexing billions of documents?

2010-04-02 Thread Burton-West, Tom
We are currently indexing 5 million books in Solr, scaling up over the next few years to 20 million. However we are using the entire book as a Solr document. We are evaluating the possibility of indexing individual pages as there are some use cases where users want the most relevant pages

Re: Index db data

2010-04-02 Thread MitchK
Hello trueman, here are some helpful pages from the wiki: DataImportHandler: http://wiki.apache.org/solr/DataImportHandler And if there are some troubles, you may find an answer here: http://wiki.apache.org/solr/DataImportHandlerFaq An example for a data-config.xml you can find at the

Re: Index db data

2010-04-02 Thread MitchK
Additionally to my first post: At the wiki there is given a http-request for full-import. I haven't worked yet with SolrJ, but I think you need to copy those parts from the URL that show the directory-structure of your Solr-instance. For the example I suggested to have a look at, I think it will

Re: Index db data

2010-04-02 Thread MitchK
No HTTP-call. That's a missunderstanding. For a http-call you need to have an url like this: http://host:port/solr/dataimport?command=full-import For the SolrJ-client I *think* that your query only needs to look like this: /solr/dataimport?command=full-import However, I have never worked with

Re: Experience with indexing billions of documents?

2010-04-02 Thread darren
My guess is that you will need to take advantage of Solr 1.5's upcoming cloud/cluster renovations and use multiple indexes to comfortably achieve those numbers. Hypthetically, in that case, you won't be limited by single index docid limitations of Lucene. We are currently indexing 5 million

Re: Experience with indexing billions of documents?

2010-04-02 Thread Peter Sturge
You can do this today with multiple indexes, replication and distributed searching. SolrCloud/clustering will certainly make life easier when it comes to managing these, but with distributed searches over multiple indexes, you're limited only by how much hardware you can throw at it. On Fri, Apr

Re: Experience with indexing billions of documents?

2010-04-02 Thread Rich Cariens
A colleague of mine is using native Lucene + some home-grown patches/optimizations to index over 13B small documents in a 32-shard environment, which is around 406M docs per shard. If there's a 2B doc id limitation in Lucene then I assume he's patched it himself. On Fri, Apr 2, 2010 at 1:17 PM,

Re: MoreLikeThis function queries

2010-04-02 Thread Blargy
Bueller? Anyone? :) -- View this message in context: http://n3.nabble.com/MoreLikeThis-function-queries-tp692377p693648.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: MoreLikeThis function queries

2010-04-02 Thread Darren Govoni
Its Friday dude. Give it a couple days. ;) On Fri, 2010-04-02 at 11:50 -0800, Blargy wrote: Bueller? Anyone? :)

highlighter issue

2010-04-02 Thread Joe Calderon
hello *, i have a field that is indexing the string the ex-girlfriend as these tokens: [the, exgirlfriend, ex, girlfriend] then they are passed to the edgengram filter, this allows me to match different user spellings and allows for partial highlighting, however a token like 'ex' would get

Re: Search accross more than one field (dismax) ignored

2010-04-02 Thread MitchK
Hoss, thank you for responsing. This behaviour was caused by an unexpected behaviour of the RessourceLoader caused by an utf-8-BOM encodet file. I have mentioned this in another thread on the mail-list, sorry for forget to say this also here. Kind regards - Mitch -- View this message in

Re: highlighter issue

2010-04-02 Thread Erik Hatcher
Will adding the RemoveDuplicatesTokenFilter(Factory) do the trick here? Erik On Apr 2, 2010, at 4:13 PM, Joe Calderon wrote: hello *, i have a field that is indexing the string the ex-girlfriend as these tokens: [the, exgirlfriend, ex, girlfriend] then they are passed to the edgengram

Re: highlighter issue

2010-04-02 Thread Joe Calderon
i had tried it earlier with no effect, when i looked at the source, it doesnt look at offsets at all, just position increments, so short of somebody finding a better way i going to create a similar filter that compared offsets... On Fri, Apr 2, 2010 at 2:07 PM, Erik Hatcher erik.hatc...@gmail.com

Unable to load MailEntityProcessor or org.apache.solr.handler.dataimport.MailEntityProcessor

2010-04-02 Thread Andrew McCombe
Hi I am experimenting with Solr to index my gmail and am experiencing an error: 'Unable to load MailEntityProcessor or org.apache.solr.handler.dataimport.MailEntityProcessor' I downloaded a fresh 1.4 tgz, extracted it and added the following to example/solr/config/solrconfig.xml:

Re: MoreLikeThis function queries

2010-04-02 Thread Blargy
Fair enough :) -- View this message in context: http://n3.nabble.com/MoreLikeThis-function-queries-tp692377p693872.html Sent from the Solr - User mailing list archive at Nabble.com.

Related terms/combined terms

2010-04-02 Thread Blargy
Not sure of the exact vocabulary I am looking for so I'll try to explain myself. Given a search term is there anyway to return back a list of related/grouped keywords (based on the current state of the index) for that term. For example say I have a sports catalog and I search for Callaway. Is