Indexing large documents

2007-08-20 Thread Fouad Mardini
Hello, I am using solr to index text extracted from word documents, and it is working really well. Recently i started noticing that some documents are not indexed, that is i know that the word foobar is in a document, but when i search for foobar the id of that document is not returned. I suspect

RE: Indexing large documents

2007-08-20 Thread praveen jain
Hi I want to know how to update my .xml file which have other field then the default one , so which file o have to modify, and how. pRAVEEN jAIN +919890599250 -Original Message- From: Fouad Mardini [mailto:[EMAIL PROTECTED] Sent: Monday, August 20, 2007 4:00 PM To:

Re: Indexing large documents

2007-08-20 Thread Peter Manis
Fouad, I would check the error log or console for any possible errors first. They may not show up, it really depends on how you are processing the word document (custom solr, feeding the text to it, etc). We are using a custom version of solr with PDF, DOC, XLS, etc text extraction and I have

Re: Indexing large documents

2007-08-20 Thread Fouad Mardini
Well, I am using the java textmining library to extract text from documents, then i do a post to solr I do not have an error log, i only have *.request.log files in the logs directory Thanks On 8/20/07, Peter Manis [EMAIL PROTECTED] wrote: Fouad, I would check the error log or console for

Re: Indexing large documents

2007-08-20 Thread Peter Manis
The that should show some errors if something goes wrong, if not the console usually will. The errors will look like a java stacktrace output. Did increasing the heap do anything for you? Changing mine to 256mb max worked fine for all of our files. On 8/20/07, Fouad Mardini [EMAIL PROTECTED]

Re: Indexing large documents

2007-08-20 Thread Pieter Berkel
You will probably need to increase the value of maxFieldLength in your solrconfig.xml. The default value is 1 which might explain why your documents are not being completely indexed. Piete On 20/08/07, Peter Manis [EMAIL PROTECTED] wrote: The that should show some errors if something

Re: how to retrieve all the documents in an index?

2007-08-20 Thread Erik Hatcher
Yes - they come back in the order indexed. Erik On Aug 19, 2007, at 7:20 PM, Yu-Hui Jin wrote: BTW, Hoss, is there a default order for the documents returned by running this query? thanks, -Hui On 8/16/07, Chris Hostetter [EMAIL PROTECTED] wrote: : Any of you know whether

RE: Structured Lucene documents

2007-08-20 Thread Pierre-Yves LANDRON
Hello ! At least, I've had the oportunity to test your solution, Pieter, which was to use dynamic field : dynamicField name=page* type=text indexed=true stored=true / Store each page in a separate field (e.g. page1, page2, page3 .. pageN) then at query time, use the highlighting

problem with quering solr after indexing UTF-8 encoded CSV files

2007-08-20 Thread Ben Shlomo, Yatir
Hi! I have utf-8 encoded data inside a csv file (actually it’s a tab separated file - attached) I can index it with no apparent errors I did not forget to set this in my tomcat configuration Server ... Service ... Connector ... URIEncoding=UTF-8/ When I query a document

Enquiry on Search Results counting

2007-08-20 Thread Jeffrey Tiong
Hi, I am trying to do some counting on certain fields of the search results, currently I am using PHP to do the counting, but it is impossible to do this when the results sets reach a few hundred thousands. Does anyone here has any idea on how to do this? Example of scenario, 1. The solr

RE: Solr 1.1. vs. 1.2.

2007-08-20 Thread Lance Norskog
While we're on the topic, there appear to be a ton of new features in 1.3, and they are getting debugged. When do you plan to do an official 1.3 release? -Original Message- From: Yu-Hui Jin [mailto:[EMAIL PROTECTED] Sent: Friday, August 17, 2007 11:53 PM To: solr-user@lucene.apache.org

RE: How to read values of a field efficiently

2007-08-20 Thread Chris Hostetter
: TermEnum terms = searcher.getReader().terms(new Term(field, )); : while (terms.term() != null terms.term().field() == field){ : //do things : terms.next(); : } : while( te.next() ) { : final Term term = te.term(); you're missing the key piece

Re: Custom Sorting

2007-08-20 Thread Chris Hostetter
: Sort sort = new Sort(new SortField[] : { SortField.FIELD_SCORE, new SortField(customValue, SortField.FLOAT, : true) }); : indexSearcher.search(q, sort) that appears to just be a sort on score withe a secondary reversed float sort on whatever field name is in the variable customValue

RE: solr + carrot2

2007-08-20 Thread Lance Norskog
No, this about the Carrot2 clustering tool, specifically the Swing application. To make this app use a Solr service you have to code a custom searcher for your Solr. I'm requesting a generic UI for Carrot2 that works against any Solr. -Original Message- From: Mike Klaas [mailto:[EMAIL

clear index

2007-08-20 Thread Sundling, Paul
what is the best approach to clearing an index? The use case is that I'm doing some performance testing with various index sizes. In between indexing (embedded and soon HTTP/XML) I need to clear the index so I have a fresh start. What's the best approach, close the index and delete the files?

Re: clear index

2007-08-20 Thread Pieter Berkel
If you are using solr 1.2 the following command (followed by a commit / optimize) should do the trick: deletequery*:*/query/delete cheers, Piete On 21/08/07, Sundling, Paul [EMAIL PROTECTED] wrote: what is the best approach to clearing an index? The use case is that I'm doing some

Commit performance

2007-08-20 Thread Lance Norskog
How long should a commit take? I've got about 9.8G of data for 9M of records. (Yes, I'm indexing too much data.) My commits are taking 20-30 seconds. Since other people set the autocommit to 1 second, I'm guessing we have a major mistake somewhere in our configurations. We have a lot of

Re: clear index

2007-08-20 Thread Charles Hornberger
IIRC you can also also simply stop the servlet container, delete the contents of the data directory by hand, then restart the container. -Charlie On 8/20/07, Pieter Berkel [EMAIL PROTECTED] wrote: If you are using solr 1.2 the following command (followed by a commit / optimize) should do the