Re: indexing pdf documents

2008-05-13 Thread Cam Bazz
yes, I have seen the documentation on RichDocumentRequestHandler at the http://wiki.apache.org/solr/UpdateRichDocuments page. However, from what I understand this just feeds documents to solr. How can I construct something like: document_id, document_name, document_text and feed it in. (i.e. my

how to clean an index ?

2008-05-13 Thread Pierre-Yves LANDRON
Hello, I want to clean an index (ie delete all documents), but cannot delete the index repertory. Is it possible with the rest interface ? Thanks, Pierre-Yves Landron _ Explore the seven wonders of the world

phrase query with DismaxHandler

2008-05-13 Thread KhushbooLohia
Hi All, I am using EnglishPorterFilterFactory in text field for stemming the words. Also I am using DisMaxRequestHandler for handling requests. When phrase query is passed to solr ex: windows installation. Sometimes the results obtained are correct but sometimes the results occur with only

Duplicates results when using a non optimized index

2008-05-13 Thread Tim Mahy
Hi all, is this expected behavior when having an index like this : numDocs : 9479963 maxDoc : 12622942 readerImpl : MultiReader which is in the process of optimizing that when we search through the index we get this : doc long name=id15257559/long /doc doc long name=id15257559/long /doc doc

RE: how to clean an index ?

2008-05-13 Thread Tim Mahy
Hi, you can create a delete query matching al your documents like the query *:* greetings, Tim Van: Pierre-Yves LANDRON [EMAIL PROTECTED] Verzonden: dinsdag 13 mei 2008 11:53 Aan: solr-user@lucene.apache.org Onderwerp: how to clean an index ? Hello, I

Re: help for preprocessing the query

2008-05-13 Thread Umar Shah
On Mon, May 12, 2008 at 10:30 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: You'll *not* write a servlet. You'll write implement the Filter interface http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/servlet/Filter.html In the doFilter method, you'll create a ServletRequestWrapper

Differences between nightly builds

2008-05-13 Thread Lucas F. A. Teixeira
Hello, Here we use a nightly build from aug '07. It`s what we need with some bugs that we`ve worked on it. I want to change this to a newer nightly build, but as this is 'stable' people are affraid of changing to a 'unknown' build. Is there some place where I can find all changes between

Re: help for preprocessing the query

2008-05-13 Thread Shalin Shekhar Mangar
Did you put a filter-mapping in web.xml? On Tue, May 13, 2008 at 4:20 PM, Umar Shah [EMAIL PROTECTED] wrote: On Mon, May 12, 2008 at 10:30 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: You'll *not* write a servlet. You'll write implement the Filter interface

Re: help for preprocessing the query

2008-05-13 Thread Umar Shah
On Tue, May 13, 2008 at 4:39 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Did you put a filter-mapping in web.xml? no, I just did that and it seems to be working... what is filter-mapping required for? On Tue, May 13, 2008 at 4:20 PM, Umar Shah [EMAIL PROTECTED] wrote: On Mon,

Warning: latest Tomcat 6 release is broken (was Re: Weird problems with document size)

2008-05-13 Thread Andrew Savory
Hi, Here's a warning for anyone trying to use solr in the latest release of tomcat, 6.0.16. Previously I was having problems successfully posting updates to a solr instance running in tomcat: 2008/5/9 Andrew Savory [EMAIL PROTECTED]: Meanwhile it seems that these documents can successfully

RE: how to clean an index ?

2008-05-13 Thread Pierre-Yves LANDRON
Thanks ! I should have known ! anyway, it works fine. From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Date: Tue, 13 May 2008 11:58:16 +0200 Subject: RE: how to clean an index ? Hi, you can create a delete query matching al your documents like the query *:* greetings,

Commit problems on Solr 1.2 with Tomcat

2008-05-13 Thread William Pierce
Hi, I am having problems with Solr 1.2 running tomcat version 6.0.16 (I also tried 6.0.14 but same problems exist). Here is the situation: I have an ASP.net application where I am trying to add and commit a single document to an index. After I add the document and issue the commit / I can

Re: JMX monitoring

2008-05-13 Thread Marshall Weir
Thank you, Shalin! It works great. Marshall On May 13, 2008, at 1:57 AM, Shalin Shekhar Mangar wrote: Hi Marshall, I've uploaded a new patch which works off the current trunk. Let me know if you run into any problems with this. On Tue, May 13, 2008 at 2:36 AM, Marshall Weir [EMAIL

Re: indexing pdf documents

2008-05-13 Thread Bess Sadler
C.B., are you saying you have metadata about your PDF files (i.e., title, author, etc) separate from the PDF file itself, or are you saying you want to extract that information from the PDF file? The first of these is pretty easy, the second of these can be difficult or impossible,

Re: Commit problems on Solr 1.2 with Tomcat

2008-05-13 Thread Alexander Ramos Jardim
Maybe a delay in commit? How may time elapsed between commits? 2008/5/13 William Pierce [EMAIL PROTECTED]: Hi, I am having problems with Solr 1.2 running tomcat version 6.0.16 (I also tried 6.0.14 but same problems exist). Here is the situation: I have an ASP.net application where I am

Re: Commit problems on Solr 1.2 with Tomcat

2008-05-13 Thread Yonik Seeley
By default, a commit won't return until a new searcher has been opened and the results are visible. So just make sure you wait for the commit command to return before querying. Also, if you are committing every add, you can avoid a separate commit command by putting ?commit=true in the URL of the

Re: ERROR:unknown field, but what document was it?

2008-05-13 Thread Alexander Ramos Jardim
Well, Keep-Alive is a standard at HTTP/1.1, it is not a Java standard. 2008/5/8 Chris Hostetter [EMAIL PROTECTED]: : My tests showed that it was a big difference. It took about 1.2 seconds to : index 500 separate adds in separate xml files (with a single commit : afterwards), compared to

Re: help for preprocessing the query

2008-05-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
http://java.sun.com/products/servlet/Filters.html this is a servlet container feature BTW , this may not be a right forum for this topic. --Noble On Tue, May 13, 2008 at 5:04 PM, Umar Shah [EMAIL PROTECTED] wrote: On Tue, May 13, 2008 at 4:39 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote:

Re: ERROR:unknown field, but what document was it?

2008-05-13 Thread Yonik Seeley
On Thu, May 8, 2008 at 4:59 PM, [EMAIL PROTECTED] wrote: My tests showed that it was a big difference. It took about 1.2 seconds to index 500 separate adds in separate xml files (with a single commit afterwards), compared to about 200 milliseconds when sending a single xml with 500 adds.

Re: Commit problems on Solr 1.2 with Tomcat

2008-05-13 Thread William Pierce
Thanks for the comments The reason I am just adding one document followed by a commit is for this particular test --- in actuality, I will be loading documents from a db. But thanks for the pointer on the ?commit=true on the add command. Now on the commit / problem itself, I am still

Re: Commit problems on Solr 1.2 with Tomcat

2008-05-13 Thread Erik Hatcher
I'm not sure if you are issuing a separate commit/ _request_ after your add, or putting a commit/ into the same request. Solr only supports one command (add or commit, but not both) per request. Erik On May 13, 2008, at 10:36 AM, William Pierce wrote: Thanks for the comments

Re: Commit problems on Solr 1.2 with Tomcat

2008-05-13 Thread William Pierce
Erik: I am indeed issuing multiple Solr requests. Here is my code snippet (deletexml and addxml are the strings that contain the add and delete strings for the items to be added or deleted). For our simple example, nothing is being deleted so stufftodelete is always false.

Re: Commit problems on Solr 1.2 with Tomcat

2008-05-13 Thread Yonik Seeley
Is SendSolrIndexingRequest synchronous or asynchronous? If the call to SendSolrIndexingRequest() can return before the response from the add is received, then the commit could sneak in and finish *before* the add is done (in which case, you won't see it before the next commit). -Yonik On Tue,

Re: How Special Character '' used in indexing

2008-05-13 Thread Walter Underwood
ASAP means As Soon As Possible, not As Soon As Convenient. Please don't say that if you don't mean it. --wunder On 5/12/08 6:48 AM, Ricky [EMAIL PROTECTED] wrote: Hi Mike, Thanx for your reply. I have got the answer to the question posted. I know people are donating time here. ASAP doesnt

Re: Extending XmlRequestHandler

2008-05-13 Thread Walter Underwood
There is one huge advantage of talking to Solr with SolrJ (or any other client that uses the REST API), and that is that you can put an HTTP cache between that and Solr. We get a 75% hit rate on that cache. SOAP is not cacheable in any useful sense. I designed and implemented the SOAP interface

Re: single character terms in index - why?

2008-05-13 Thread Walter Underwood
We have some useful single character terms in the rating field, like G and R, alongside PG and others. wunder On 5/12/08 1:33 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Mon, May 12, 2008 at 4:13 PM, Naomi Dushay [EMAIL PROTECTED] wrote: So I'm now asking: why would SOLR want single

Re: JMX monitoring

2008-05-13 Thread Chris Hostetter
: Thank you, Shalin! : : It works great. please post feedback like that in the Jira issue (and ideally: vote for the issue as well) comments on issues from people saying that they tried out patches and found them useful helps committers asses the utility of features and the effectiveness of

Re: Field Grouping

2008-05-13 Thread oleg_gnatovskiy
There is an XSLT example here: http://wiki.apache.org/solr/XsltResponseWriter , but it doesn't seem like that would work either... This example would only do a group by for the current page. If I use Solr for pagination, this would not work for me. oleg_gnatovskiy wrote: But I don't want the

Re: Unlimited number of return documents?

2008-05-13 Thread Marc Bechler
Hi Walter, thanks for your advice and, indeed, that is correct, too (and I will likely implement the cleaning mechanism this way). (Btw: what would the query look like to get row 101-200 in the second chunk?) However, using chunks is not atomic so you may not get results of inegrity.

Re: Unlimited number of return documents?

2008-05-13 Thread Alexander Ramos Jardim
I think that keep a transaction log is the best aproach for your use case. 2008/5/13 Marc Bechler [EMAIL PROTECTED]: Hi Walter, thanks for your advice and, indeed, that is correct, too (and I will likely implement the cleaning mechanism this way). (Btw: what would the query look like to get

Re: Field Grouping

2008-05-13 Thread Ryan McKinley
You may want to check field collapsing https://issues.apache.org/jira/browse/SOLR-236 There is a patch that works against 1.2, but the one for trunk needs some work before it can work... ryan On May 13, 2008, at 2:46 PM, oleg_gnatovskiy wrote: There is an XSLT example here:

Re: Differences between nightly builds

2008-05-13 Thread Otis Gospodnetic
Lucas, Look at the solr svn repository's root and you will see a file name called CHANGES.txt. That contains all major Solr changes back to January 2006. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Lucas F. A. Teixeira [EMAIL

Re: phrase query with DismaxHandler

2008-05-13 Thread Otis Gospodnetic
Hi, I don't think what you said makes 100% sense. Both words windows and installation will be different when stemmed. Also, the word combination will not get stemmed to combine (that's not what Porter stemmer would shop it down it). Go to Solr admin page, enter windows installation, then

Re: the time factor

2008-05-13 Thread Otis Gospodnetic
Jack, The answer is: function queries! :) You can easily use function queries with DisMaxRequestHandler. For example, this is what you can add to the dismax config section in solrconfig.xml: str name=bf recip(rord(addDate),1,1000,1000)^2.5 /str Assuming you have an addDate

Re: Duplicates results when using a non optimized index

2008-05-13 Thread Otis Gospodnetic
Hm, not sure why that is happening, but here is some info regarding other stuff from your email - there should be no duplicates even if you are searching an index that is being optimized - why are you searching an index that is being optimized? It's doable, but people typically perform