Re: Incremental indexing of database

2008-07-22 Thread anshuljohri
Thanks Paul, this is what I was looking for :) -Anshul Johri Noble Paul നോബിള്‍ नोब्ळ् wrote: > > Did you take a look at DataImportHandler? > > On Wed, Jul 23, 2008 at 1:57 AM, Ravish Bhagdev > <[EMAIL PROTECTED]> wrote: >> Can't you write triggers for your database/tables you want to index?

Re: Incremental indexing of database

2008-07-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
Did you take a look at DataImportHandler? On Wed, Jul 23, 2008 at 1:57 AM, Ravish Bhagdev <[EMAIL PROTECTED]> wrote: > Can't you write triggers for your database/tables you want to index? > That way you can keep track of all kinds of changes and updates and > not just addition of a new record. > >

Re: Seeking Anecdotes: Solr Plugins

2008-07-22 Thread Mike Klaas
On 22-Jul-08, at 4:34 PM, Chris Hostetter wrote: Hey everybody, I'll be giving a talk called "Apache Solr: Beyond the Box" at ApacheCon this year, which will focus on the how/when/why of writing Solr Plugins... http://us.apachecon.com/c/acus2008/sessions/10 I've got several use

Seeking Anecdotes: Solr Plugins

2008-07-22 Thread Chris Hostetter
Hey everybody, I'll be giving a talk called "Apache Solr: Beyond the Box" at ApacheCon this year, which will focus on the how/when/why of writing Solr Plugins... http://us.apachecon.com/c/acus2008/sessions/10 I've got several use cases I can refer to for examples, both from my day j

Re: Vote on a new solr logo

2008-07-22 Thread Chris Harris
How about releasing the preliminary results so we can see if a run-off is in order! On Tue, Jul 22, 2008 at 6:37 AM, Mark Miller <[EMAIL PROTECTED]> wrote: > My opinion: if its already a runaway, we might as well not prolong things. > If not though, we should probably give some time for any possib

RE: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi
Yes, it is a cache, it stores "sorted" by "sorted field" array of Document IDs together with sorted fields; query results can intersect with it and reorder accordingly. But memory requirements should be well documented. It uses internally WeakHashMap which is not good(!!!) - a lot of "unde

RE: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi
I am hoping [new StringIndex (retArray, mterms)] is called only once per-sort-field and cached somewhere at Lucene; theoretically you need multiply number of documents on size of field (supposing that field contains unique text); you need not tokenize this field; you need not store TermVect

RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar
I haven't seen the source code before, But I don't know why the sorting isn't done after the fetch is done. Wouldn't that make it more faster. at least in case of field level sorting? I could be wrong too and the implementation might probably be better. But don't know why all of the fields have

Re: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi
Ok, after some analysis of FieldCacheImpl: - it is supposed that (sorted) Enumeration of "terms" is less than total number of documents (so that SOLR uses specific field type for sorted searches: solr.StrField with omitNorms="true") It creates int[reader.maxDoc()] array, checks (sorted) En

Re: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi
Ok, what is confusing me is implicit guess that FieldCache contains "field" and Lucene uses in-memory sort instead of using file-system "index"... Array syze: 100Mb (25M x 4 bytes), and it is just pointers (4-byte integers) to documents in index. org.apache.lucene.search.FieldCacheI

RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar
Thanks for your help Mark. Lemme explore a little more and see if some one else can help me out too. :) > Date: Tue, 22 Jul 2008 16:53:47 -0400> From: [EMAIL PROTECTED]> To: > solr-user@lucene.apache.org> Subject: Re: Out of memory on Solr sorting> > > Someone else is going to have to take over

Re: Out of memory on Solr sorting

2008-07-22 Thread Mark Miller
Someone else is going to have to take over Sundar - I am new to solr myself. I will say this though - 25 million docs is pushing the limits of a single machine - especially with only 2 gig of RAM, especially with any sort fields. You are at the edge I believe. But perhaps you can get by. Have

RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar
Hi Mark, I am still getting an OOM even after increasing the heap to 1024. The docset I have is numDocs : 1138976 maxDoc : 1180554 Not sure how much more I would need. Is there any other way out of this. I noticed another interesting behavior. I have a Solr setup on a personal B

Re: Out of memory on Solr sorting

2008-07-22 Thread Mark Miller
Hmmm...I think its 32bits an integer with an index entry for each doc, so **25 000 000 x 32 bits = 95.3674316 megabytes** Then you have the string array that contains each unique term from your index...you can guess that based on the number of terms in your index and an avg length guess.

Re: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi
Thank you very much Mark, it explains me a lot; I am guessing: for 1,000,000 documents with a [string] field of average size 1024 bytes I need 1Gb for single IndexSearcher instance; field-level cache it is used internally by Lucene (can Lucene manage size if it?); we can't have 1G of such

Re: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi
Mark, Question: how much memory I need for 25,000,000 docs if I do a sort by field, 256 bytes. 6.4Gb? Quoting Mark Miller <[EMAIL PROTECTED]>: Because to sort efficiently, Solr loads the term to sort on for each doc in the index into an array. For ints,longs, etc its just an array the siz

Re: Incremental indexing of database

2008-07-22 Thread Ravish Bhagdev
Can't you write triggers for your database/tables you want to index? That way you can keep track of all kinds of changes and updates and not just addition of a new record. Ravish On Tue, Jul 22, 2008 at 8:15 PM, anshuljohri <[EMAIL PROTECTED]> wrote: > > Hi, > > In my project i have to index whol

Re: Out of memory on Solr sorting

2008-07-22 Thread Mark Miller
Fuad Efendi wrote: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 I just noticed, this is an exact number of documents in index: 25191979 (http://www.tokenizer.org/, you can sort - click headers Id, [COuntry, Site, Price] in a tab

RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar
Sorry, Not 30, but 300 :) From: [EMAIL PROTECTED]: [EMAIL PROTECTED]: RE: Out of memory on Solr sortingDate: Tue, 22 Jul 2008 20:19:49 + Thanks for the explanation mark. The reason I had it as 512 max was cos earlier the data file was just about 30 megs and it increased to this much for of

RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar
Thanks for the explanation mark. The reason I had it as 512 max was cos earlier the data file was just about 30 megs and it increased to this much for of the usage of EdgeNGramFactoryFilter for 2 fields. Thats great to know it just happens for the first search. But this exception has been occur

Re: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi
SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 I just noticed, this is an exact number of documents in index: 25191979 (http://www.tokenizer.org/, you can sort - click headers Id, [COuntry, Site, Price] in a table; experimental)

Re: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi
I've even seen exceptions (posted here) when "sort"-type queries caused Lucene to allocate 100Mb arrays, here is what happened to me: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 at org.apache.lucene.search.FieldCacheImpl$1

Re: Out of memory on Solr sorting

2008-07-22 Thread Mark Miller
Because to sort efficiently, Solr loads the term to sort on for each doc in the index into an array. For ints,longs, etc its just an array the size of the number of docs in your index (i believe deleted or not). For a String its an array to hold each unique string and an array of ints indexing

RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar
Thanks Fuad. But why does just sorting provide an OOM. I executed the query without adding the sort clause it executed perfectly. In fact I even tried remove the maxrows=10 and executed. it came out fine. Queries with bigger results seems to come out fine too. But why just sort

RE: Out of memory on Solr sorting

2008-07-22 Thread Fuad Efendi
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) - this piece of code do not request Array[100M] (as I seen with Lucene), it asks only few bytes / Kb for a field... Probably 128 - 512 is not enough; it is also advisable to use equal sizes -Xms1024M -Xmx1024M (i

Re: pf nixes fl

2008-07-22 Thread Jason Rennie
Doh! I mistakenly changed the request handler from dismax to standard. Ignore me... Jason On Tue, Jul 22, 2008 at 2:59 PM, Jason Rennie <[EMAIL PROTECTED]> wrote: > I'm using solrj and all I did was add a pf entry to solrconfig.xml. I > don't think it could be an ampersand issue... > > Here's

Incremental indexing of database

2008-07-22 Thread anshuljohri
Hi, In my project i have to index whole database which contains text data only. So if i follow incremental indexing approch than my problem is that how will I pick delta data from database. Is there any utility in solr to keep track the last indexed record. Or is there any other approch to solve

RE: Out of memory on Solr sorting

2008-07-22 Thread sundar shankar
> From: [EMAIL PROTECTED] > To: solr-user@lucene.apache.org > Subject: Out of memory on Solr sorting > Date: Tue, 22 Jul 2008 19:11:02 + > > > Hi, > Sorry again fellos. I am not sure whats happening. The day with solr is bad > for me I guess. EZMLM didnt let me send any mails this morning

Out of memory on Solr sorting

2008-07-22 Thread sundar shankar
Hi, SOrry again fellos. I am not sure whats happening. The day with solr is bad for me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm subscription and when I did, it said I was already a member. Now my mails are all coming out bad. Sorry for troubling y'all this ba

Re: pf nixes fl

2008-07-22 Thread Jason Rennie
I'm using solrj and all I did was add a pf entry to solrconfig.xml. I don't think it could be an ampersand issue... Here's an example query: wt=xml&rows=10&start=0&q=urban+outfitters&qt=recsKeyword&version=2.2 Here's qt config: 0.06 name^1.5 tags description^0

RE: OOM on Solr Sort

2008-07-22 Thread sundar shankar
Sorry for that. I didnt realise how my had finally arrived. Sorry!!! From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: OOM on Solr Sort Date: Tue, 22 Jul 2008 18:33:43 + Hi, We are developing a product in a agile manner and the current implementation has a data of size ju

Re: pf nixes fl

2008-07-22 Thread Mike Klaas
On 22-Jul-08, at 11:53 AM, Jason Rennie wrote: Just tried adding a pf field to my request handler. When I did this, solr returned all document fields for each doc (no "score") instead of returning the fields specified in fl. Bug? Feature? Anyone know what the reason for this behavior

pf nixes fl

2008-07-22 Thread Jason Rennie
Just tried adding a pf field to my request handler. When I did this, solr returned all document fields for each doc (no "score") instead of returning the fields specified in fl. Bug? Feature? Anyone know what the reason for this behavior is? I'm using solr 1.2. Thanks, Jason

OOM on Solr Sort

2008-07-22 Thread sundar shankar
Hi,We are developing a product in a agile manner and the current implementation has a data of size just about a 800 megs in dev. The memory allocated to solr on dev (Dual core Linux box) is 128-512. My config= trueMy Field===

Re: Vote on a new solr logo

2008-07-22 Thread Mark Miller
Chris Hostetter wrote: : http://people.apache.org/~shalin/poll.html Except the existing Solr logo isn't on that list. i smell election tampering :) I had put it in my poll :) I actually considered bringing that up to Shalin as well, but couldn't bring myself to be so fair I suppose Serious

Re: Vote on a new solr logo

2008-07-22 Thread Chris Hostetter
: http://people.apache.org/~shalin/poll.html Except the existing Solr logo isn't on that list. i smell election tampering :) Seriously though: I realized a long time ago that there was too much email to reply too, too many features to work on, too many patches to review, and too few hours in

Re: Specifying explicit FacetQuery w/ a normal query?

2008-07-22 Thread Mike Klaas
I'm somewhat perplexed, under what circumstances would you be able to send one query to Solr but not two? -Mike On 21-Jul-08, at 8:37 PM, Jon Baer wrote: Well that's my problem ... I can't :-) When you put a fq=doctype:news in there your can't get an explicit facet.query, it will only let

Re: Solr cache statistics explanation

2008-07-22 Thread Koji Sekiguchi
lookups : how many times the cache is referenced hits : how many times the cache hits hitratio : hits/lookups and for other items, see my previous mail at: http://www.nabble.com/about-cache-to10192953.html Koji Marshall Gunter wrote: Can someone point me to an in depth explanation of the Solr c

Re: Query for an exact match

2008-07-22 Thread Ian Connor
Indeed - one of my shards had it listed as "text" doh! thanks for the assurance that led me to find my bug On Tue, Jul 22, 2008 at 11:43 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Tue, Jul 22, 2008 at 11:39 AM, Ian Connor <[EMAIL PROTECTED]> wrote: >> > omitNorms="true"/> > > This will giv

Re: java.io.IOException: read past EOF

2008-07-22 Thread Fuad Efendi
Lucene index corrupted... which harddrive do you use? Quoting Rohan <[EMAIL PROTECTED]>: Hi Guys, This is my first post. We are running solr with multiple Indexes, 20 Indexes. I'm facing problem with 5 one. I'm not able to run optimized on that index. I'm getting following error. Your help is

Re: maximum length of string that Solr can index

2008-07-22 Thread Yonik Seeley
Lucene has a maxFieldLength (the number of tokens to index for a given field name). It can be configured via solrconfig.xml: 1 -Yonik On Tue, Jul 22, 2008 at 11:38 AM, Tom Lord <[EMAIL PROTECTED]> wrote: > Hi, we've looked for info about this issue online and in the code and am > none the wis

Re: Query for an exact match

2008-07-22 Thread Yonik Seeley
On Tue, Jul 22, 2008 at 11:39 AM, Ian Connor <[EMAIL PROTECTED]> wrote: > omitNorms="true"/> This will give you an exact match. As I said, if it's not, then you didn't restart and reindex, or you are querying the wrong field. -Yonik

Re: Query for an exact match

2008-07-22 Thread Ian Connor
At the moment for "string", I have: is there an example type so that it will do exact matches? Would "alphaOnlySort" do the trick? It looks like it might. On Tue, Jul 22, 2008 at 11:20 AM, Yonik Se

maximum length of string that Solr can index

2008-07-22 Thread Tom Lord
Hi, we've looked for info about this issue online and in the code and am none the wiser - help would be much appreciated. We are indexing the full text of journals using Solr. We currently pass in the journal text, up to maybe 130 pages, and index it in one go. We are seeing Solr stop indexing af

java.io.IOException: read past EOF

2008-07-22 Thread Rohan
Hi Guys, This is my first post. We are running solr with multiple Indexes, 20 Indexes. I'm facing problem with 5 one. I'm not able to run optimized on that index. I'm getting following error. Your help is really appreciated. java.io.IOException: read past EOF at org.apache.lucene.store.B

Re: Query for an exact match

2008-07-22 Thread Yonik Seeley
On Tue, Jul 22, 2008 at 11:08 AM, Ian Connor <[EMAIL PROTECTED]> wrote: > How can I require an exact field match in a query. For instance, if a > title field contains "Nature" or "Nature Cell Biology", when I search > title:Nature I only want "Nature" and not "Nature Cell Biology". Is > that someth

Re: spellchecker problems (bugs)

2008-07-22 Thread Shalin Shekhar Mangar
On Tue, Jul 22, 2008 at 8:37 PM, Geoffrey Young <[EMAIL PROTECTED]> wrote: > > > Shalin Shekhar Mangar wrote: > >> The problems you described in the spellchecker are noted in >> https://issues.apache.org/jira/browse/SOLR-622 -- I shall create an issue >> to >> synchronize spellcheck.build so that

Re: spellchecker problems (bugs)

2008-07-22 Thread Yonik Seeley
On Tue, Jul 22, 2008 at 11:07 AM, Geoffrey Young <[EMAIL PROTECTED]> wrote: > Shalin Shekhar Mangar wrote: >> >> The problems you described in the spellchecker are noted in >> https://issues.apache.org/jira/browse/SOLR-622 -- I shall create an issue >> to >> synchronize spellcheck.build so that the

Query for an exact match

2008-07-22 Thread Ian Connor
How can I require an exact field match in a query. For instance, if a title field contains "Nature" or "Nature Cell Biology", when I search title:Nature I only want "Nature" and not "Nature Cell Biology". Is that something I do as a query or do I need to re index it with the field defined in a cert

Re: spellchecker problems (bugs)

2008-07-22 Thread Geoffrey Young
Shalin Shekhar Mangar wrote: The problems you described in the spellchecker are noted in https://issues.apache.org/jira/browse/SOLR-622 -- I shall create an issue to synchronize spellcheck.build so that the index is not corrupted. I'd like to discuss this a little... I'm not sure that I want

Solr cache statistics explanation

2008-07-22 Thread Marshall Gunter
Can someone point me to an in depth explanation of the Solr cache statistics? I'm having a hard time finding it online. Specifically, I'm interested in these fields that are listed on the Solr admin statistics pages in the cache section: lookups hits hitratio inserts evictions size cumulative_

Restricting spellchecker for certain words

2008-07-22 Thread Jon Baer
It seems that spellchecker works great except all the "7 words you can't say on TV" resolve to very important people, is there a way to contain just certain words so they don't resolve? Thanks. - Jon

Re: facets and filter query

2008-07-22 Thread Erik Hatcher
All facet counts currently returned are _within_ the set of documents constrained by query (q) and filter query (fq) parameters - just to clarify what it does. Why? That's the general use case. Returning back counts from differently constrained sets requires some custom coding - perhaps

Re: facets and filter query

2008-07-22 Thread Jon Baer
This is *exactly* my issue ... very nicely worded :-) I would have thought facet.query=*:* would have been the solution but it does not seem to work. Im interested in getting these *total* counts for UI display. - Jon On Jul 22, 2008, at 6:05 AM, Stefan Oestreicher wrote: Hi, I have a

Re: Vote on a new solr logo

2008-07-22 Thread Mark Miller
My opinion: if its already a runaway, we might as well not prolong things. If not though, we should probably give some time for any possible laggards. The 'admin look' poll received its first 19-20 votes in the first night / morning, and has only gotten 2 or 3 since then, so probably no use goi

Re: Vote on a new solr logo

2008-07-22 Thread Shalin Shekhar Mangar
28 votes so far and counting! When should we close this poll? On Tue, Jul 22, 2008 at 1:18 AM, Mark Miller <[EMAIL PROTECTED]> wrote: > Perfect! Thank you Shalin. Much appreciated, and a dead simple system. My > vote is in. > > - Mark > > > Shalin Shekhar Mangar wrote: > >> Will this do? A 1-5 f

facets and filter query

2008-07-22 Thread Stefan Oestreicher
Hi, I have a category field in my index which I'd like to use as a facet. However my search frontend only allows you to search in one category at a time for which I'm using a filter query. Unfortunately the filter query restricts the facets as well. My query looks like this: ?q=content:foo&fq=cat

Re: Solr/Lucene search term stats

2008-07-22 Thread Preetam Rao
hi, try using faceted search, http://wiki.apache.org/solr/SimpleFacetParameters something like facet=true&facet.query=title:("web2.0" OR "ajax") facet.query - gives the number of matching documents for a query. You can run the examples in the above link and see how it works.. You can also try u

Re: Alphabetical search on solr

2008-07-22 Thread Erik Hatcher
On Jul 22, 2008, at 5:08 AM, Adrian M Bell wrote: We have a catalogue of documents that we have a solr index on. We need to provide an alphabetical search, so that a user can list all documents with a title beginning A, B and so on... So how do we do this? Currently we have built up the f

Alphabetical search on solr

2008-07-22 Thread Adrian M Bell
Ok this might be a simple one, or more likely, my understanding of solr is shot to bits We have a catalogue of documents that we have a solr index on. We need to provide an alphabetical search, so that a user can list all documents with a title beginning A, B and so on... So how do we do th

Solr/Lucene search term stats

2008-07-22 Thread Sunil
Hi All, I am working on a module using Solr, where I want to get the stats of each keyword found in each field. If my search term is: (title:("web2.0" OR "ajax") OR description:("web2.0" OR "ajax")) Then I want to know how many times web2.0/ajax were found in title or description. Any suggestio