Re: Large index question

2006-10-13 Thread Artem
AM. I use just FSDir. Best regards, Artem. SS> Supposed I want to index 500,000 documents (average document size is SS> 4kBs). Let's assume I create a single index and that the index is SS> static (I'm not going to add any new documents to it). I would guess SS> the index wo

Re[2]: Out of memory exception for big indexes

2007-04-08 Thread Artem
dSortFactory.create(sortFieldName, sortDescending) to get Sort object for sorting query. StoredFieldSortFactory source file can be extracted from LUCENE-769 patch or from sharehound sources: http://sharehound.cvs.sourceforge.net/*checkout*/sharehound/jNetCrawler/src/java/org/apache/lucene/search/S

Fwd: Re[2]: Out of memory exception for big indexes

2007-04-09 Thread Artem
e(sortFieldName, sortDescending) to get Sort object for sorting query. StoredFieldSortFactory source file can be extracted from LUCENE-769 patch or from sharehound sources: http://sharehound.cvs.sourceforge.net/*checkout*/sharehound/jNetCrawler/src/java/org/apache/lucene/search/StoredFieldSortFactory.java Regard

Re[4]: Out of memory exception for big indexes

2007-04-09 Thread Artem
n you please explain what sorted NB> queries mean? Is simple keyword search a sorted query? That's simple - if results presented on screen sorted by that keyword it's sorted query :) Another test is your system's code. Sorted queries I mean are calls to IndexSearcher.search(q

Re[2]: Out of memory exception for big indexes

2007-04-25 Thread Artem
uld be successfully applied to multiple fields, but I'm not going to implement it yet. May be you? :) Regards, Artem IV> Hi Artem, IV> Thank you very much for your mails :) IV> So first I have to tell you that your patch works perfectly even with IV> very big indexes - 40 GB (you c

FieldValueFilter and non-DocValues fields

2015-02-27 Thread Artem Redkin
, Integer.MIN_VALUE, Integer.MAX_VALUE, true, true) for now) to find documents with field present? 2. Can one use DocValues effectively instead of Stored Fields to show found documents? Or I should use UninvertingReader for fields that are not in DocValues? Thanks! -- Artem Redkin artemred

Fwd: Re[2]: Fwd: Re[2]: 30 milllion+ docs on a single server

2006-10-04 Thread Artem Vasiliev
ourceforge.net/*checkout*/sharehound/jNetCrawler/src/java/org/sourceforge/sharehound/lucene/FilesSearchCommandImpl.java Best regards, Artem. OG> Artem & Co., OG> Have you benchmarked this approach against the typical non-caching OG> Sort? Both performance and memory benchmark? O

Re: Large index question

2006-10-13 Thread Artem Vasiliev
AM. I use just FSDir. Best regards, Artem. SS> Supposed I want to index 500,000 documents (average document size is SS> 4kBs). Let's assume I create a single index and that the index is SS> static (I'm not going to add any new documents to it). I would guess SS> the index wo

Re: Out of memory exception for big indexes

2007-04-24 Thread Artem Vasiliev
string of maybe 50 symbols average). The machine looks quite beefy to me - Intel core duo with 500M given to the application. Regards, Artem On 4/23/07, Ivan Vasilev <[EMAIL PROTECTED]> wrote: Hi All, THANK YOU FOR YOUR HELP :) I put this problem in the forum but I had no chance to

Re: Out of memory exception for big indexes

2007-04-24 Thread Artem Vasiliev
. Regards, Artem On 4/24/07, Artem Vasiliev <[EMAIL PROTECTED]> wrote: Hello Ivan! It's so sad to me that you had bad results with that patch. :) The discussion in the ticket is out-of-date - the patch was initially in several classes, used WeakHashMap but then it evolved to what it&

Re: Out of memory exception for big indexes

2007-04-24 Thread Artem Vasiliev
Hi Ivan! btw may be forbidding the sorted search in case of too many results is an option? I did this way in my case. Regards, Artem. On 4/24/07, Artem Vasiliev <[EMAIL PROTECTED]> wrote: Ahhh, you said in your original post that your search matches _all_ the results.. Yup my patch wi

another lucene-based application

2006-03-17 Thread Artem Vasiliev
now search SMB file shares in LANs by their pathes and names. It tracks changes in directories so it even knows about deleted files. The application is in alpha now but it's working, it has Web UI and RSS subscription for query results (added today :), so I'll be glad if it help somebody h

Re[2]: another lucene-based application

2006-03-17 Thread Artem Vasiliev
now XD> search SMB file shares in LANs by their pathes and names. It tracks XD> changes in directories so it even knows about deleted files. The XD> application is in alpha now but it's working, it has Web UI and RSS XD> subscription for query results (added today

Re[2]: OutOfMemory with search(Query, Sort)

2006-04-01 Thread Artem Vasiliev
ou sort on a field, a FieldCache entry is populated, YS> enabling random access to that field value. A single int field for a YS> 4M index == int[400] == 16MB memory. -- Best regards, Artemmailto:[EMAIL PROTECTED] ---

Re[4]: OutOfMemory with search(Query, Sort)

2006-04-04 Thread Artem Vasiliev
, CH> filename" .. this should reduce the size quite a bit if the number of -- Best regards, Artem http://sharehound.sourceforge.net sharehound, the open source filesystems indexer - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re[4]: OutOfMemory with search(Query, Sort)

2006-04-04 Thread Artem Vasiliev
create your own YS> FieldCache that doesn't create/store that String[]. The int[] array here contains references to String[] and to populate it still all the field values need to be loaded and compared/sorted which is what I want to avoid. I guess my option is not to us

Re[2]: 30 milllion+ docs on a single server

2006-08-21 Thread Artem Vasiliev
ly search resultsets it's almost as fast as the default implementation. Note that this solution is ready only for single-field sorting currently. Best regards, Artem OG> This is unlikely to work well/fast. It will depend on the OG> size of the index (not in terms of the number of docs

maxDoc/numDocs int fields

2014-03-21 Thread Artem Gayardo-Matrosov
way to restore the original docID. -- Thanks in advance, Artem.

Re: maxDoc/numDocs int fields

2014-03-21 Thread Artem Gayardo-Matrosov
siteReader... Is there a better way? Thanks, Artem. On Fri, Mar 21, 2014 at 6:33 PM, Oliver Christ wrote: > Can you split your corpus across multiple Lucene instances? > > Cheers, Oli > > -Original Message- > From: Artem Gayardo-Matrosov [mailto:ar...@gayardo.com] >

Re: maxDoc/numDocs int fields

2014-03-21 Thread Artem Gayardo-Matrosov
and only solution to this problem. Artem. On Fri, Mar 21, 2014 at 7:29 PM, Jack Krupansky wrote: > Every word occurrence or every unique word? I mean Integer.MAX_VALUE like > 2 billion. Even the OED only has 600,000 words defined. The former doesn't > sound like a good use case m