Re: indexing pdf documents
yes, I have seen the documentation on RichDocumentRequestHandler at the http://wiki.apache.org/solr/UpdateRichDocuments page. However, from what I understand this just feeds documents to solr. How can I construct something like: document_id, document_name, document_text and feed it in. (i.e. my documents have labels) Best. -C.B. On Tue, May 13, 2008 at 1:30 AM, Chris Harris [EMAIL PROTECTED] wrote: Solr does not have this support built in, but there's a patch for it: https://issues.apache.org/jira/browse/SOLR-284 On Mon, May 12, 2008 at 2:02 PM, Cam Bazz [EMAIL PROTECTED] wrote: Hello, Before making a little program to extract the txt from my pdfs and feed it into solr with xml, I just wanted to check if solr has capability to digest pdf files apart from xml? Best Regards, -C.B.
how to clean an index ?
Hello, I want to clean an index (ie delete all documents), but cannot delete the index repertory. Is it possible with the rest interface ? Thanks, Pierre-Yves Landron _ Explore the seven wonders of the world http://search.msn.com/results.aspx?q=7+wonders+worldmkt=en-USform=QBRE
phrase query with DismaxHandler
Hi All, I am using EnglishPorterFilterFactory in text field for stemming the words. Also I am using DisMaxRequestHandler for handling requests. When phrase query is passed to solr ex: windows installation. Sometimes the results obtained are correct but sometimes the results occur with only word install or just windows or just with installation. Its observed that, if the phrase doesn't have anything to be stemmed like windows or cpmany the results are returned as expected. But phrase with words like combination, colusion get stemmed to combine or conclude and brings wierd results. Please revert back. Thanks Khushboo -- View this message in context: http://www.nabble.com/phrase-query-with-DismaxHandler-tp17204921p17204921.html Sent from the Solr - User mailing list archive at Nabble.com.
Duplicates results when using a non optimized index
Hi all, is this expected behavior when having an index like this : numDocs : 9479963 maxDoc : 12622942 readerImpl : MultiReader which is in the process of optimizing that when we search through the index we get this : doc long name=id15257559/long /doc doc long name=id15257559/long /doc doc long name=id17177888/long /doc doc long name=id11825631/long /doc doc long name=id11825631/long /doc The id field is declared like this : field name=id type=long indexed=true stored=true required=true / and is set as the unique identity like this in the schema xml : uniqueKeyid/uniqueKey so the question : is this expected behavior and if so is there a way to let Solr only return unique documents ? greetings and thanx in advance, Tim Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
RE: how to clean an index ?
Hi, you can create a delete query matching al your documents like the query *:* greetings, Tim Van: Pierre-Yves LANDRON [EMAIL PROTECTED] Verzonden: dinsdag 13 mei 2008 11:53 Aan: solr-user@lucene.apache.org Onderwerp: how to clean an index ? Hello, I want to clean an index (ie delete all documents), but cannot delete the index repertory. Is it possible with the rest interface ? Thanks, Pierre-Yves Landron _ Explore the seven wonders of the world http://search.msn.com/results.aspx?q=7+wonders+worldmkt=en-USform=QBRE Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx
Re: help for preprocessing the query
On Mon, May 12, 2008 at 10:30 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: You'll *not* write a servlet. You'll write implement the Filter interface http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/servlet/Filter.html In the doFilter method, you'll create a ServletRequestWrapper which changes the incoming param. Then you'll call chain.doFilter with the new request object. You'll need to add this filter before the SolrRequestFilter in Solr's web.xml I created a CustomFilter that would dump the request contents to a file, I created the jar and added it to the solr.war in WEB_INF/lib folder I edited the web.xml in the same folder to include the following lines: filter filter-nameCustomFilter/filter-name filter-class(packagename).CustomFilter/filter-class /filter where CustomFilter is the name of the class extending javax.servlet.Filter. I don't see anything in the contents of the file.. thanks for your help -umar Look at http://www.onjava.com/pub/a/onjava/2001/05/10/servlet_filters.html?page=1for more details. On Mon, May 12, 2008 at 8:51 PM, Umar Shah [EMAIL PROTECTED] wrote: On Mon, May 12, 2008 at 8:42 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: ServletRequest and ServletRequestWrapper are part of the Java servlet-api (not Solr). Basically, Koji is hinting at writing a ServletFilter implementation (again using servlet-api) and creating a wrapper ServletRequest which modifies the underlying request params which can then be used by Solr. sorry for the silly question, basically i am new to servlets. Now If my understanding is right , I will need to create a servlet/wrapper that would listen the user facing queries and then pass the processed text to solr request handler and I need to pack this servlet class file into Solr war file. But How would I ensure that my servlet is called instead of solr request handler? On Mon, May 12, 2008 at 8:36 PM, Umar Shah [EMAIL PROTECTED] wrote: On Mon, May 12, 2008 at 2:50 PM, Koji Sekiguchi [EMAIL PROTECTED] wrote: Hi Umar, You may be able to preprocess your request parameter in your servlet filter. In the doFilter() method, you do: ServletRequest myRequest = new MyServletRequestWrapper( request ); Thanks for your response, Where is the ServletRequest class , I am using Solr 1.3 trunk code found SolrServletm, butit is depricated, which class can I use instead of SolrRequest in 1.3 codebase? I also tried overloading Standard request handler , How do I re write queryparams there? Can you point me to some documentation? : chain.doFilter( myRequest, response ); And you have MyServletRequestWrapper that extends ServletRequestWrapper. Then you can get|set q* parameters through getParameter() method. Hope this helps, Koji Umar Shah wrote: Hi, Due some requirement I need to transform the user queries before passing it to the standard handler in Solr, can anyone suggest me the best way to do this. I will need to use a transfomation class that would provide functions to process the input query 'qIn' and transform it to the resultant query 'qOut' and then pass it to solr handler as if qOut were the original user query. thanks in anticipation, -umar -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Differences between nightly builds
Hello, Here we use a nightly build from aug '07. It`s what we need with some bugs that we`ve worked on it. I want to change this to a newer nightly build, but as this is 'stable' people are affraid of changing to a 'unknown' build. Is there some place where I can find all changes between some date (my aug 07') and nowadays? Maybe with this I can make their mind! Thank you. []s, -- Lucas Frare A. Teixeira [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Tel: +55 11 3660.1622 - R3018
Re: help for preprocessing the query
Did you put a filter-mapping in web.xml? On Tue, May 13, 2008 at 4:20 PM, Umar Shah [EMAIL PROTECTED] wrote: On Mon, May 12, 2008 at 10:30 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: You'll *not* write a servlet. You'll write implement the Filter interface http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/servlet/Filter.html In the doFilter method, you'll create a ServletRequestWrapper which changes the incoming param. Then you'll call chain.doFilter with the new request object. You'll need to add this filter before the SolrRequestFilter in Solr's web.xml I created a CustomFilter that would dump the request contents to a file, I created the jar and added it to the solr.war in WEB_INF/lib folder I edited the web.xml in the same folder to include the following lines: filter filter-nameCustomFilter/filter-name filter-class(packagename).CustomFilter/filter-class /filter where CustomFilter is the name of the class extending javax.servlet.Filter. I don't see anything in the contents of the file.. thanks for your help -umar Look at http://www.onjava.com/pub/a/onjava/2001/05/10/servlet_filters.html?page=1for more details. On Mon, May 12, 2008 at 8:51 PM, Umar Shah [EMAIL PROTECTED] wrote: On Mon, May 12, 2008 at 8:42 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: ServletRequest and ServletRequestWrapper are part of the Java servlet-api (not Solr). Basically, Koji is hinting at writing a ServletFilter implementation (again using servlet-api) and creating a wrapper ServletRequest which modifies the underlying request params which can then be used by Solr. sorry for the silly question, basically i am new to servlets. Now If my understanding is right , I will need to create a servlet/wrapper that would listen the user facing queries and then pass the processed text to solr request handler and I need to pack this servlet class file into Solr war file. But How would I ensure that my servlet is called instead of solr request handler? On Mon, May 12, 2008 at 8:36 PM, Umar Shah [EMAIL PROTECTED] wrote: On Mon, May 12, 2008 at 2:50 PM, Koji Sekiguchi [EMAIL PROTECTED] wrote: Hi Umar, You may be able to preprocess your request parameter in your servlet filter. In the doFilter() method, you do: ServletRequest myRequest = new MyServletRequestWrapper( request ); Thanks for your response, Where is the ServletRequest class , I am using Solr 1.3 trunk code found SolrServletm, butit is depricated, which class can I use instead of SolrRequest in 1.3 codebase? I also tried overloading Standard request handler , How do I re write queryparams there? Can you point me to some documentation? : chain.doFilter( myRequest, response ); And you have MyServletRequestWrapper that extends ServletRequestWrapper. Then you can get|set q* parameters through getParameter() method. Hope this helps, Koji Umar Shah wrote: Hi, Due some requirement I need to transform the user queries before passing it to the standard handler in Solr, can anyone suggest me the best way to do this. I will need to use a transfomation class that would provide functions to process the input query 'qIn' and transform it to the resultant query 'qOut' and then pass it to solr handler as if qOut were the original user query. thanks in anticipation, -umar -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: help for preprocessing the query
On Tue, May 13, 2008 at 4:39 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Did you put a filter-mapping in web.xml? no, I just did that and it seems to be working... what is filter-mapping required for? On Tue, May 13, 2008 at 4:20 PM, Umar Shah [EMAIL PROTECTED] wrote: On Mon, May 12, 2008 at 10:30 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: You'll *not* write a servlet. You'll write implement the Filter interface http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/servlet/Filter.html In the doFilter method, you'll create a ServletRequestWrapper which changes the incoming param. Then you'll call chain.doFilter with the new request object. You'll need to add this filter before the SolrRequestFilter in Solr's web.xml I created a CustomFilter that would dump the request contents to a file, I created the jar and added it to the solr.war in WEB_INF/lib folder I edited the web.xml in the same folder to include the following lines: filter filter-nameCustomFilter/filter-name filter-class(packagename).CustomFilter/filter-class /filter where CustomFilter is the name of the class extending javax.servlet.Filter. I don't see anything in the contents of the file.. thanks for your help -umar Look at http://www.onjava.com/pub/a/onjava/2001/05/10/servlet_filters.html?page=1for more details. On Mon, May 12, 2008 at 8:51 PM, Umar Shah [EMAIL PROTECTED] wrote: On Mon, May 12, 2008 at 8:42 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: ServletRequest and ServletRequestWrapper are part of the Java servlet-api (not Solr). Basically, Koji is hinting at writing a ServletFilter implementation (again using servlet-api) and creating a wrapper ServletRequest which modifies the underlying request params which can then be used by Solr. sorry for the silly question, basically i am new to servlets. Now If my understanding is right , I will need to create a servlet/wrapper that would listen the user facing queries and then pass the processed text to solr request handler and I need to pack this servlet class file into Solr war file. But How would I ensure that my servlet is called instead of solr request handler? On Mon, May 12, 2008 at 8:36 PM, Umar Shah [EMAIL PROTECTED] wrote: On Mon, May 12, 2008 at 2:50 PM, Koji Sekiguchi [EMAIL PROTECTED] wrote: Hi Umar, You may be able to preprocess your request parameter in your servlet filter. In the doFilter() method, you do: ServletRequest myRequest = new MyServletRequestWrapper( request ); Thanks for your response, Where is the ServletRequest class , I am using Solr 1.3 trunk code found SolrServletm, butit is depricated, which class can I use instead of SolrRequest in 1.3 codebase? I also tried overloading Standard request handler , How do I re write queryparams there? Can you point me to some documentation? : chain.doFilter( myRequest, response ); And you have MyServletRequestWrapper that extends ServletRequestWrapper. Then you can get|set q* parameters through getParameter() method. Hope this helps, Koji Umar Shah wrote: Hi, Due some requirement I need to transform the user queries before passing it to the standard handler in Solr, can anyone suggest me the best way to do this. I will need to use a transfomation class that would provide functions to process the input query 'qIn' and transform it to the resultant query 'qOut' and then pass it to solr handler as if qOut were the original user query. thanks in anticipation, -umar -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Warning: latest Tomcat 6 release is broken (was Re: Weird problems with document size)
Hi, Here's a warning for anyone trying to use solr in the latest release of tomcat, 6.0.16. Previously I was having problems successfully posting updates to a solr instance running in tomcat: 2008/5/9 Andrew Savory [EMAIL PROTECTED]: Meanwhile it seems that these documents can successfully be added to solr when it is running in jetty, so I'm now trying to find out what Tomcat is doing to break things. A colleague (thanks, Alexis!) has just unearthed a regression bug in tomcat dating back to February that causes posts of more than 8k to be truncated: https://issues.apache.org/bugzilla/show_bug.cgi?id=44494 So if you're using Tomcat, aim for 6.0.14 instead. Andrew. -- [EMAIL PROTECTED] / [EMAIL PROTECTED] http://www.andrewsavory.com/
RE: how to clean an index ?
Thanks ! I should have known ! anyway, it works fine. From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Date: Tue, 13 May 2008 11:58:16 +0200 Subject: RE: how to clean an index ? Hi, you can create a delete query matching al your documents like the query *:* greetings, Tim Van: Pierre-Yves LANDRON [EMAIL PROTECTED] Verzonden: dinsdag 13 mei 2008 11:53 Aan: solr-user@lucene.apache.org Onderwerp: how to clean an index ? Hello, I want to clean an index (ie delete all documents), but cannot delete the index repertory. Is it possible with the rest interface ? Thanks, Pierre-Yves Landron _ Explore the seven wonders of the world http://search.msn.com/results.aspx?q=7+wonders+worldmkt=en-USform=QBRE Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx _ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=createwx_url=/friends.aspxmkt=en-us
Commit problems on Solr 1.2 with Tomcat
Hi, I am having problems with Solr 1.2 running tomcat version 6.0.16 (I also tried 6.0.14 but same problems exist). Here is the situation: I have an ASP.net application where I am trying to add and commit a single document to an index. After I add the document and issue the commit / I can see (in the solr stats page) that the commit count has been increment but the docsPending is 1, and my document is still not visible from a search perspective. When I issue another commit/, the commit counter increments, docsPending is now zero, and my document is visible and searchable. I saw that someone was observing problems with 6.0.16 tomcat, so I reverted back to 6.0.14. Same problem. Can anyone help? -- Bill
Re: JMX monitoring
Thank you, Shalin! It works great. Marshall On May 13, 2008, at 1:57 AM, Shalin Shekhar Mangar wrote: Hi Marshall, I've uploaded a new patch which works off the current trunk. Let me know if you run into any problems with this. On Tue, May 13, 2008 at 2:36 AM, Marshall Weir [EMAIL PROTECTED] wrote: Hi, I'm new to Solr and I've been attempting to get JMX monitoring working. I can get simple information by using the - Dcom.sun.management.jmxremote command line switch, but I'd like to get more useful statistics. I've been working on applying the SOLR-256 and jmx patch, but the original revisions are pretty old and I'm having to spend a lot of time wandering through the source. Is there a better solution to getting this working or a newer version of the patch? Thank you, Marshall -- Regards, Shalin Shekhar Mangar.
Re: indexing pdf documents
C.B., are you saying you have metadata about your PDF files (i.e., title, author, etc) separate from the PDF file itself, or are you saying you want to extract that information from the PDF file? The first of these is pretty easy, the second of these can be difficult or impossible, depending on how your PDF file was generated and how consistent your files are. It's a bit of a hack, but I've had great success in the past with using XTF (http://www.cdlib.org/inside/projects/xtf/) to index my PDF files, and then pointing solr at the resulting lucene index. It's worth checking to see if this would do the trick for you. Bess Elizabeth (Bess) Sadler Research and Development Librarian Digital Scholarship Services Box 400129 Alderman Library University of Virginia Charlottesville, VA 22904 On May 13, 2008, at 3:58 AM, Cam Bazz wrote: yes, I have seen the documentation on RichDocumentRequestHandler at the http://wiki.apache.org/solr/UpdateRichDocuments page. However, from what I understand this just feeds documents to solr. How can I construct something like: document_id, document_name, document_text and feed it in. (i.e. my documents have labels) Best. -C.B. On Tue, May 13, 2008 at 1:30 AM, Chris Harris [EMAIL PROTECTED] wrote: Solr does not have this support built in, but there's a patch for it: https://issues.apache.org/jira/browse/SOLR-284 On Mon, May 12, 2008 at 2:02 PM, Cam Bazz [EMAIL PROTECTED] wrote: Hello, Before making a little program to extract the txt from my pdfs and feed it into solr with xml, I just wanted to check if solr has capability to digest pdf files apart from xml? Best Regards, -C.B.
Re: Commit problems on Solr 1.2 with Tomcat
Maybe a delay in commit? How may time elapsed between commits? 2008/5/13 William Pierce [EMAIL PROTECTED]: Hi, I am having problems with Solr 1.2 running tomcat version 6.0.16 (I also tried 6.0.14 but same problems exist). Here is the situation: I have an ASP.net application where I am trying to add and commit a single document to an index. After I add the document and issue the commit / I can see (in the solr stats page) that the commit count has been increment but the docsPending is 1, and my document is still not visible from a search perspective. When I issue another commit/, the commit counter increments, docsPending is now zero, and my document is visible and searchable. I saw that someone was observing problems with 6.0.16 tomcat, so I reverted back to 6.0.14. Same problem. Can anyone help? -- Bill -- Alexander Ramos Jardim
Re: Commit problems on Solr 1.2 with Tomcat
By default, a commit won't return until a new searcher has been opened and the results are visible. So just make sure you wait for the commit command to return before querying. Also, if you are committing every add, you can avoid a separate commit command by putting ?commit=true in the URL of the add command. -Yonik On Tue, May 13, 2008 at 9:31 AM, Alexander Ramos Jardim [EMAIL PROTECTED] wrote: Maybe a delay in commit? How may time elapsed between commits? 2008/5/13 William Pierce [EMAIL PROTECTED]: Hi, I am having problems with Solr 1.2 running tomcat version 6.0.16 (I also tried 6.0.14 but same problems exist). Here is the situation: I have an ASP.net application where I am trying to add and commit a single document to an index. After I add the document and issue the commit / I can see (in the solr stats page) that the commit count has been increment but the docsPending is 1, and my document is still not visible from a search perspective. When I issue another commit/, the commit counter increments, docsPending is now zero, and my document is visible and searchable. I saw that someone was observing problems with 6.0.16 tomcat, so I reverted back to 6.0.14. Same problem. Can anyone help? -- Bill -- Alexander Ramos Jardim
Re: ERROR:unknown field, but what document was it?
Well, Keep-Alive is a standard at HTTP/1.1, it is not a Java standard. 2008/5/8 Chris Hostetter [EMAIL PROTECTED]: : My tests showed that it was a big difference. It took about 1.2 seconds to : index 500 separate adds in separate xml files (with a single commit : afterwards), compared to about 200 milliseconds when sending a single xml with : 500 adds. And according to the documentation java automatically uses : keep-alive (I found no way to force it myself). I'm not sure what you mean by java automatically uses keep-alive ... you mean you wrote your client code using java? but how do you initiate your connections to Solr? Nothing I know of in the way Solr handles updates should make adding multiple docs in one request faster then adding one doc per request -- any added overhead should be in the servlet container (and keep-alive should minimize that) ... if you have a simple reproducable test that demonstrates otherwise, i would consider that a performance bug. : i thought we added something like this ... but i guess not. : : feel free to file a feature request in Jira. : : ah, but I guess it is only awailable in a nightly build? Do you know a jira : issue number I can look at? I didn't find anything related to this. no, i mean: i thought we added it, but when i tried on the trunk i see the same thing you see ... please file a feature request. -Hoss -- Alexander Ramos Jardim
Re: help for preprocessing the query
http://java.sun.com/products/servlet/Filters.html this is a servlet container feature BTW , this may not be a right forum for this topic. --Noble On Tue, May 13, 2008 at 5:04 PM, Umar Shah [EMAIL PROTECTED] wrote: On Tue, May 13, 2008 at 4:39 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Did you put a filter-mapping in web.xml? no, I just did that and it seems to be working... what is filter-mapping required for? On Tue, May 13, 2008 at 4:20 PM, Umar Shah [EMAIL PROTECTED] wrote: On Mon, May 12, 2008 at 10:30 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: You'll *not* write a servlet. You'll write implement the Filter interface http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/servlet/Filter.html In the doFilter method, you'll create a ServletRequestWrapper which changes the incoming param. Then you'll call chain.doFilter with the new request object. You'll need to add this filter before the SolrRequestFilter in Solr's web.xml I created a CustomFilter that would dump the request contents to a file, I created the jar and added it to the solr.war in WEB_INF/lib folder I edited the web.xml in the same folder to include the following lines: filter filter-nameCustomFilter/filter-name filter-class(packagename).CustomFilter/filter-class /filter where CustomFilter is the name of the class extending javax.servlet.Filter. I don't see anything in the contents of the file.. thanks for your help -umar Look at http://www.onjava.com/pub/a/onjava/2001/05/10/servlet_filters.html?page=1for more details. On Mon, May 12, 2008 at 8:51 PM, Umar Shah [EMAIL PROTECTED] wrote: On Mon, May 12, 2008 at 8:42 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: ServletRequest and ServletRequestWrapper are part of the Java servlet-api (not Solr). Basically, Koji is hinting at writing a ServletFilter implementation (again using servlet-api) and creating a wrapper ServletRequest which modifies the underlying request params which can then be used by Solr. sorry for the silly question, basically i am new to servlets. Now If my understanding is right , I will need to create a servlet/wrapper that would listen the user facing queries and then pass the processed text to solr request handler and I need to pack this servlet class file into Solr war file. But How would I ensure that my servlet is called instead of solr request handler? On Mon, May 12, 2008 at 8:36 PM, Umar Shah [EMAIL PROTECTED] wrote: On Mon, May 12, 2008 at 2:50 PM, Koji Sekiguchi [EMAIL PROTECTED] wrote: Hi Umar, You may be able to preprocess your request parameter in your servlet filter. In the doFilter() method, you do: ServletRequest myRequest = new MyServletRequestWrapper( request ); Thanks for your response, Where is the ServletRequest class , I am using Solr 1.3 trunk code found SolrServletm, butit is depricated, which class can I use instead of SolrRequest in 1.3 codebase? I also tried overloading Standard request handler , How do I re write queryparams there? Can you point me to some documentation? : chain.doFilter( myRequest, response ); And you have MyServletRequestWrapper that extends ServletRequestWrapper. Then you can get|set q* parameters through getParameter() method. Hope this helps, Koji Umar Shah wrote: Hi, Due some requirement I need to transform the user queries before passing it to the standard handler in Solr, can anyone suggest me the best way to do this. I will need to use a transfomation class that would provide functions to process the input query 'qIn' and transform it to the resultant query 'qOut' and then pass it to solr handler as if qOut were the original user query. thanks in anticipation, -umar -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
Re: ERROR:unknown field, but what document was it?
On Thu, May 8, 2008 at 4:59 PM, [EMAIL PROTECTED] wrote: My tests showed that it was a big difference. It took about 1.2 seconds to index 500 separate adds in separate xml files (with a single commit afterwards), compared to about 200 milliseconds when sending a single xml with 500 adds. Did you overlap the adds (use multiple threads)? -Yonik
Re: Commit problems on Solr 1.2 with Tomcat
Thanks for the comments The reason I am just adding one document followed by a commit is for this particular test --- in actuality, I will be loading documents from a db. But thanks for the pointer on the ?commit=true on the add command. Now on the commit / problem itself, I am still confused: Doesn't the commit count of 1 indicate that the commit is completed? In any event, just for testing purposes, I started everything from scratch (deleted all documents, stopped/restarted tomcat). I noticed that the only files in my index folder were: segments.gen and segments_1. Then I did the add followed by commit / and noticed that there were now three files: segments.gen, segments_1 and write.lock. Now it is 7 minutes later, and when I query the index using the http://localhost:59575/splus1/admin/; url, I still do not see the document. Again, when I issue another commit / command everything seems to work. Why are TWO commit commands apparently required? Thanks, Sridhar -- From: Yonik Seeley [EMAIL PROTECTED] Sent: Tuesday, May 13, 2008 6:42 AM To: solr-user@lucene.apache.org Subject: Re: Commit problems on Solr 1.2 with Tomcat By default, a commit won't return until a new searcher has been opened and the results are visible. So just make sure you wait for the commit command to return before querying. Also, if you are committing every add, you can avoid a separate commit command by putting ?commit=true in the URL of the add command. -Yonik On Tue, May 13, 2008 at 9:31 AM, Alexander Ramos Jardim [EMAIL PROTECTED] wrote: Maybe a delay in commit? How may time elapsed between commits? 2008/5/13 William Pierce [EMAIL PROTECTED]: Hi, I am having problems with Solr 1.2 running tomcat version 6.0.16 (I also tried 6.0.14 but same problems exist). Here is the situation: I have an ASP.net application where I am trying to add and commit a single document to an index. After I add the document and issue the commit / I can see (in the solr stats page) that the commit count has been increment but the docsPending is 1, and my document is still not visible from a search perspective. When I issue another commit/, the commit counter increments, docsPending is now zero, and my document is visible and searchable. I saw that someone was observing problems with 6.0.16 tomcat, so I reverted back to 6.0.14. Same problem. Can anyone help? -- Bill -- Alexander Ramos Jardim
Re: Commit problems on Solr 1.2 with Tomcat
I'm not sure if you are issuing a separate commit/ _request_ after your add, or putting a commit/ into the same request. Solr only supports one command (add or commit, but not both) per request. Erik On May 13, 2008, at 10:36 AM, William Pierce wrote: Thanks for the comments The reason I am just adding one document followed by a commit is for this particular test --- in actuality, I will be loading documents from a db. But thanks for the pointer on the ?commit=true on the add command. Now on the commit / problem itself, I am still confused: Doesn't the commit count of 1 indicate that the commit is completed? In any event, just for testing purposes, I started everything from scratch (deleted all documents, stopped/restarted tomcat). I noticed that the only files in my index folder were: segments.gen and segments_1. Then I did the add followed by commit / and noticed that there were now three files: segments.gen, segments_1 and write.lock. Now it is 7 minutes later, and when I query the index using the http://localhost:59575/splus1/admin/; url, I still do not see the document. Again, when I issue another commit / command everything seems to work. Why are TWO commit commands apparently required? Thanks, Sridhar -- From: Yonik Seeley [EMAIL PROTECTED] Sent: Tuesday, May 13, 2008 6:42 AM To: solr-user@lucene.apache.org Subject: Re: Commit problems on Solr 1.2 with Tomcat By default, a commit won't return until a new searcher has been opened and the results are visible. So just make sure you wait for the commit command to return before querying. Also, if you are committing every add, you can avoid a separate commit command by putting ?commit=true in the URL of the add command. -Yonik On Tue, May 13, 2008 at 9:31 AM, Alexander Ramos Jardim [EMAIL PROTECTED] wrote: Maybe a delay in commit? How may time elapsed between commits? 2008/5/13 William Pierce [EMAIL PROTECTED]: Hi, I am having problems with Solr 1.2 running tomcat version 6.0.16 (I also tried 6.0.14 but same problems exist). Here is the situation: I have an ASP.net application where I am trying to add and commit a single document to an index. After I add the document and issue the commit / I can see (in the solr stats page) that the commit count has been increment but the docsPending is 1, and my document is still not visible from a search perspective. When I issue another commit/, the commit counter increments, docsPending is now zero, and my document is visible and searchable. I saw that someone was observing problems with 6.0.16 tomcat, so I reverted back to 6.0.14. Same problem. Can anyone help? -- Bill -- Alexander Ramos Jardim
Re: Commit problems on Solr 1.2 with Tomcat
Erik: I am indeed issuing multiple Solr requests. Here is my code snippet (deletexml and addxml are the strings that contain the add and delete strings for the items to be added or deleted). For our simple example, nothing is being deleted so stufftodelete is always false. //we are done...we now need to post the requests... if (stufftodelete) { SendSolrIndexingRequest(deletexml); } if (stufftoadd) { SendSolrIndexingRequest(addxml); } if ( stufftodelete || stufftoadd) { SendSolrIndexingRequest(commit waitFlush=\true\ waitSearcher=\true\/); } I am using the full form of the commit here just to see if the commit / was somehow not working. The SendSolrIndexingRequest is the routine that takes the string argument and issues the POST request to the update URL. Thanks, Bill -- From: Erik Hatcher [EMAIL PROTECTED] Sent: Tuesday, May 13, 2008 7:40 AM To: solr-user@lucene.apache.org Subject: Re: Commit problems on Solr 1.2 with Tomcat I'm not sure if you are issuing a separate commit/ _request_ after your add, or putting a commit/ into the same request. Solr only supports one command (add or commit, but not both) per request. Erik On May 13, 2008, at 10:36 AM, William Pierce wrote: Thanks for the comments The reason I am just adding one document followed by a commit is for this particular test --- in actuality, I will be loading documents from a db. But thanks for the pointer on the ?commit=true on the add command. Now on the commit / problem itself, I am still confused: Doesn't the commit count of 1 indicate that the commit is completed? In any event, just for testing purposes, I started everything from scratch (deleted all documents, stopped/restarted tomcat). I noticed that the only files in my index folder were: segments.gen and segments_1. Then I did the add followed by commit / and noticed that there were now three files: segments.gen, segments_1 and write.lock. Now it is 7 minutes later, and when I query the index using the http://localhost:59575/splus1/admin/; url, I still do not see the document. Again, when I issue another commit / command everything seems to work. Why are TWO commit commands apparently required? Thanks, Sridhar -- From: Yonik Seeley [EMAIL PROTECTED] Sent: Tuesday, May 13, 2008 6:42 AM To: solr-user@lucene.apache.org Subject: Re: Commit problems on Solr 1.2 with Tomcat By default, a commit won't return until a new searcher has been opened and the results are visible. So just make sure you wait for the commit command to return before querying. Also, if you are committing every add, you can avoid a separate commit command by putting ?commit=true in the URL of the add command. -Yonik On Tue, May 13, 2008 at 9:31 AM, Alexander Ramos Jardim [EMAIL PROTECTED] wrote: Maybe a delay in commit? How may time elapsed between commits? 2008/5/13 William Pierce [EMAIL PROTECTED]: Hi, I am having problems with Solr 1.2 running tomcat version 6.0.16 (I also tried 6.0.14 but same problems exist). Here is the situation: I have an ASP.net application where I am trying to add and commit a single document to an index. After I add the document and issue the commit / I can see (in the solr stats page) that the commit count has been increment but the docsPending is 1, and my document is still not visible from a search perspective. When I issue another commit/, the commit counter increments, docsPending is now zero, and my document is visible and searchable. I saw that someone was observing problems with 6.0.16 tomcat, so I reverted back to 6.0.14. Same problem. Can anyone help? -- Bill -- Alexander Ramos Jardim
Re: Commit problems on Solr 1.2 with Tomcat
Is SendSolrIndexingRequest synchronous or asynchronous? If the call to SendSolrIndexingRequest() can return before the response from the add is received, then the commit could sneak in and finish *before* the add is done (in which case, you won't see it before the next commit). -Yonik On Tue, May 13, 2008 at 10:49 AM, William Pierce [EMAIL PROTECTED] wrote: Erik: I am indeed issuing multiple Solr requests. Here is my code snippet (deletexml and addxml are the strings that contain the add and delete strings for the items to be added or deleted). For our simple example, nothing is being deleted so stufftodelete is always false. //we are done...we now need to post the requests... if (stufftodelete) { SendSolrIndexingRequest(deletexml); } if (stufftoadd) { SendSolrIndexingRequest(addxml); } if ( stufftodelete || stufftoadd) { SendSolrIndexingRequest(commit waitFlush=\true\ waitSearcher=\true\/); } I am using the full form of the commit here just to see if the commit / was somehow not working. The SendSolrIndexingRequest is the routine that takes the string argument and issues the POST request to the update URL. Thanks, Bill -- From: Erik Hatcher [EMAIL PROTECTED] Sent: Tuesday, May 13, 2008 7:40 AM To: solr-user@lucene.apache.org Subject: Re: Commit problems on Solr 1.2 with Tomcat I'm not sure if you are issuing a separate commit/ _request_ after your add, or putting a commit/ into the same request. Solr only supports one command (add or commit, but not both) per request. Erik On May 13, 2008, at 10:36 AM, William Pierce wrote: Thanks for the comments The reason I am just adding one document followed by a commit is for this particular test --- in actuality, I will be loading documents from a db. But thanks for the pointer on the ?commit=true on the add command. Now on the commit / problem itself, I am still confused: Doesn't the commit count of 1 indicate that the commit is completed? In any event, just for testing purposes, I started everything from scratch (deleted all documents, stopped/restarted tomcat). I noticed that the only files in my index folder were: segments.gen and segments_1. Then I did the add followed by commit / and noticed that there were now three files: segments.gen, segments_1 and write.lock. Now it is 7 minutes later, and when I query the index using the http://localhost:59575/splus1/admin/; url, I still do not see the document. Again, when I issue another commit / command everything seems to work. Why are TWO commit commands apparently required? Thanks, Sridhar -- From: Yonik Seeley [EMAIL PROTECTED] Sent: Tuesday, May 13, 2008 6:42 AM To: solr-user@lucene.apache.org Subject: Re: Commit problems on Solr 1.2 with Tomcat By default, a commit won't return until a new searcher has been opened and the results are visible. So just make sure you wait for the commit command to return before querying. Also, if you are committing every add, you can avoid a separate commit command by putting ?commit=true in the URL of the add command. -Yonik On Tue, May 13, 2008 at 9:31 AM, Alexander Ramos Jardim [EMAIL PROTECTED] wrote: Maybe a delay in commit? How may time elapsed between commits? 2008/5/13 William Pierce [EMAIL PROTECTED]: Hi, I am having problems with Solr 1.2 running tomcat version 6.0.16 (I also tried 6.0.14 but same problems exist). Here is the situation: I have an ASP.net application where I am trying to add and commit a single document to an index. After I add the document and issue the commit / I can see (in the solr stats page) that the commit count has been increment but the docsPending is 1, and my document is still not visible from a search perspective. When I issue another commit/, the commit counter increments, docsPending is now zero, and my document is visible and searchable. I saw that someone was observing problems with 6.0.16 tomcat, so I reverted back to 6.0.14. Same problem. Can anyone help? -- Bill -- Alexander Ramos Jardim
Re: How Special Character '' used in indexing
ASAP means As Soon As Possible, not As Soon As Convenient. Please don't say that if you don't mean it. --wunder On 5/12/08 6:48 AM, Ricky [EMAIL PROTECTED] wrote: Hi Mike, Thanx for your reply. I have got the answer to the question posted. I know people are donating time here. ASAP doesnt mean that am demanding them to reply fast. Please read the lines before you comment something(*Please kindly* reply ASAP). Am a newbie and with curiosity i have requested to answer. I dont know if it has hurt you(Am sorry for that) Thanks, Ricky. On Fri, May 9, 2008 at 3:30 PM, Mike Klaas [EMAIL PROTECTED] wrote: On 9-May-08, at 6:26 AM, Ricky wrote: I have tried sending the 'amp' instead of '' like the following, field name =companyA amp K Inc/field. But i still get the same error entity reference name can not contain character ' position: START_TAG seen ...fieldname = companyA amp .. Please use a library for doing xml encoding--there is absolutely no reason to do this yourself. Please kindly reply ASAP. Please also realize that people responding here are donating their time and that it is inappropriate to ask for an expedited response. -Mike
Re: Extending XmlRequestHandler
There is one huge advantage of talking to Solr with SolrJ (or any other client that uses the REST API), and that is that you can put an HTTP cache between that and Solr. We get a 75% hit rate on that cache. SOAP is not cacheable in any useful sense. I designed and implemented the SOAP interface for all the search engines at Verity, so I'm not just guessing about this. wunder On 5/12/08 7:02 AM, Erik Hatcher [EMAIL PROTECTED] wrote: On May 12, 2008, at 9:52 AM, Alexander Ramos Jardim wrote: I understood what you said about putting the SOAP at Solr. I agree. That's not smart. Now, I am thinking about the web service talking with an embedded Solr server. Is that you were talking about? Quite pleasantly you don't even really have to code in that level of detail in any hardcoded way. You can use SolrJ behind a SOAP interface, and use it with a SolrServer. The implementation of that can switch between embedded (which I'm not even really sure what that means exactly) or via HTTP the good ol' fashioned way. Erik
Re: single character terms in index - why?
We have some useful single character terms in the rating field, like G and R, alongside PG and others. wunder On 5/12/08 1:33 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Mon, May 12, 2008 at 4:13 PM, Naomi Dushay [EMAIL PROTECTED] wrote: So I'm now asking: why would SOLR want single character terms? Solr, like Lucene, can be configured however you want. The example schema is just that - an example. But, there are many field types that might be interested in keeping single letter terms. One can even think of examples where single letter terms would be useful for normal full-text fields, depending on the domain or on the analysys. One simple example: d-day might be alternately indexed as d day so it would be found with a query of d day -Yonik
Re: JMX monitoring
: Thank you, Shalin! : : It works great. please post feedback like that in the Jira issue (and ideally: vote for the issue as well) comments on issues from people saying that they tried out patches and found them useful helps committers asses the utility of features and the effectiveness of the patch. -Hoss
Re: Field Grouping
There is an XSLT example here: http://wiki.apache.org/solr/XsltResponseWriter , but it doesn't seem like that would work either... This example would only do a group by for the current page. If I use Solr for pagination, this would not work for me. oleg_gnatovskiy wrote: But I don't want the search results to be ranked based on that field. I only want all the documents with the same value grouped together... The way my system is set up, most documents will have that field empty. Thus, if Is rot by it, those documents that have a value will bubble to the top... Yonik Seeley wrote: On Mon, May 12, 2008 at 9:58 PM, oleg_gnatovskiy [EMAIL PROTECTED] wrote: Hello. I was wondering if there is a way to get solr to return fields with the same value for a particular field together. For example I might want to have all the documents with exactly the same name field all returned next to each other. Is this possible? Thanks! Sort by that field. Since you can only sort by fields with a single term at most (this rules out full-text fields), you might want to do a copyField of the name field to something like a name_s field which is of type string (which can be sorted on). -Yonik -- View this message in context: http://www.nabble.com/Field-Grouping-tp17199592p17215641.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unlimited number of return documents?
Hi Walter, thanks for your advice and, indeed, that is correct, too (and I will likely implement the cleaning mechanism this way). (Btw: what would the query look like to get row 101-200 in the second chunk?) However, using chunks is not atomic so you may not get results of inegrity. Regards, marc Walter Underwood schrieb: Nope. You should fetch all the rows in 100 row chunks. Much, much better than getting them all in one request. I do that to load the auto-complete table. I really cannot think of a good reason to fetch all the rows in one request. That is more like a denial of service attack than like a useful engineering solution. wunder On 5/9/08 11:11 AM, Marc Bechler [EMAIL PROTECTED] wrote: Hi all, one possible use case could be to synchronize the index against a given database. E.g., assume that you have a filesystem that is indexed periodically. If files are deleted on this filesystem, they will not be deleted in the index. This way, you can get (e.g.) the complete content from your index in order to check for consistency. Btw: I also played around with the rows parameter in order to get the overall index; but I got exceptions (not sufficient heap space), when setting up rows above some higher thresholds. Regards, marc Erik Hatcher schrieb: Or make two requests... one with rows=0 to see how many documents match without retrieving any, then another with that amount specified. Erik On May 9, 2008, at 8:54 AM, Francisco Sanmartin wrote: Yeah, I understand the possible problems of changing this value. It's just a very particular case and there won't be a lot of documents to return. I guess I'll have to use a very high int number, I just wanted to know if there was any proper configuration for this situation. Thanks for the answer! Pako Otis Gospodnetic wrote: Will something a la rows=max int here work? ;) But are you sure you want to do that? It could be sloow. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Francisco Sanmartin [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, May 8, 2008 4:18:46 PM Subject: Unlimited number of return documents? What is the value to set to rows in solrconfig.xml in order not to have any limitation about the number of returned documents? I've tried with -1 and 0 but not luck... solr 0 name=rows*10* I want solr to return all available documents by default. Thanks! Pako
Re: Unlimited number of return documents?
I think that keep a transaction log is the best aproach for your use case. 2008/5/13 Marc Bechler [EMAIL PROTECTED]: Hi Walter, thanks for your advice and, indeed, that is correct, too (and I will likely implement the cleaning mechanism this way). (Btw: what would the query look like to get row 101-200 in the second chunk?) However, using chunks is not atomic so you may not get results of inegrity. Regards, marc Walter Underwood schrieb: Nope. You should fetch all the rows in 100 row chunks. Much, much better than getting them all in one request. I do that to load the auto-complete table. I really cannot think of a good reason to fetch all the rows in one request. That is more like a denial of service attack than like a useful engineering solution. wunder On 5/9/08 11:11 AM, Marc Bechler [EMAIL PROTECTED] wrote: Hi all, one possible use case could be to synchronize the index against a given database. E.g., assume that you have a filesystem that is indexed periodically. If files are deleted on this filesystem, they will not be deleted in the index. This way, you can get (e.g.) the complete content from your index in order to check for consistency. Btw: I also played around with the rows parameter in order to get the overall index; but I got exceptions (not sufficient heap space), when setting up rows above some higher thresholds. Regards, marc Erik Hatcher schrieb: Or make two requests... one with rows=0 to see how many documents match without retrieving any, then another with that amount specified. Erik On May 9, 2008, at 8:54 AM, Francisco Sanmartin wrote: Yeah, I understand the possible problems of changing this value. It's just a very particular case and there won't be a lot of documents to return. I guess I'll have to use a very high int number, I just wanted to know if there was any proper configuration for this situation. Thanks for the answer! Pako Otis Gospodnetic wrote: Will something a la rows=max int here work? ;) But are you sure you want to do that? It could be sloow. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Francisco Sanmartin [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, May 8, 2008 4:18:46 PM Subject: Unlimited number of return documents? What is the value to set to rows in solrconfig.xml in order not to have any limitation about the number of returned documents? I've tried with -1 and 0 but not luck... solr 0 name=rows*10* I want solr to return all available documents by default. Thanks! Pako -- Alexander Ramos Jardim
Re: Field Grouping
You may want to check field collapsing https://issues.apache.org/jira/browse/SOLR-236 There is a patch that works against 1.2, but the one for trunk needs some work before it can work... ryan On May 13, 2008, at 2:46 PM, oleg_gnatovskiy wrote: There is an XSLT example here: http://wiki.apache.org/solr/XsltResponseWriter , but it doesn't seem like that would work either... This example would only do a group by for the current page. If I use Solr for pagination, this would not work for me. oleg_gnatovskiy wrote: But I don't want the search results to be ranked based on that field. I only want all the documents with the same value grouped together... The way my system is set up, most documents will have that field empty. Thus, if Is rot by it, those documents that have a value will bubble to the top... Yonik Seeley wrote: On Mon, May 12, 2008 at 9:58 PM, oleg_gnatovskiy [EMAIL PROTECTED] wrote: Hello. I was wondering if there is a way to get solr to return fields with the same value for a particular field together. For example I might want to have all the documents with exactly the same name field all returned next to each other. Is this possible? Thanks! Sort by that field. Since you can only sort by fields with a single term at most (this rules out full-text fields), you might want to do a copyField of the name field to something like a name_s field which is of type string (which can be sorted on). -Yonik -- View this message in context: http://www.nabble.com/Field-Grouping-tp17199592p17215641.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Differences between nightly builds
Lucas, Look at the solr svn repository's root and you will see a file name called CHANGES.txt. That contains all major Solr changes back to January 2006. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Lucas F. A. Teixeira [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, May 13, 2008 6:59:55 AM Subject: Differences between nightly builds Hello, Here we use a nightly build from aug '07. It`s what we need with some bugs that we`ve worked on it. I want to change this to a newer nightly build, but as this is 'stable' people are affraid of changing to a 'unknown' build. Is there some place where I can find all changes between some date (my aug 07') and nowadays? Maybe with this I can make their mind! Thank you. []s, -- Lucas Frare A. Teixeira [EMAIL PROTECTED] Tel: +55 11 3660.1622 - R3018
Re: phrase query with DismaxHandler
Hi, I don't think what you said makes 100% sense. Both words windows and installation will be different when stemmed. Also, the word combination will not get stemmed to combine (that's not what Porter stemmer would shop it down it). Go to Solr admin page, enter windows installation, then modify the URL and add: qt=dismaxdebugQuery=true and have a look at the XML. It will contain the query string rewritten by DisMax, which will tell you what's going on. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: KhushbooLohia [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, May 13, 2008 5:50:30 AM Subject: phrase query with DismaxHandler Hi All, I am using EnglishPorterFilterFactory in text field for stemming the words. Also I am using DisMaxRequestHandler for handling requests. When phrase query is passed to solr ex: windows installation. Sometimes the results obtained are correct but sometimes the results occur with only word install or just windows or just with installation. Its observed that, if the phrase doesn't have anything to be stemmed like windows or cpmany the results are returned as expected. But phrase with words like combination, colusion get stemmed to combine or conclude and brings wierd results. Please revert back. Thanks Khushboo -- View this message in context: http://www.nabble.com/phrase-query-with-DismaxHandler-tp17204921p17204921.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: the time factor
Jack, The answer is: function queries! :) You can easily use function queries with DisMaxRequestHandler. For example, this is what you can add to the dismax config section in solrconfig.xml: str name=bf recip(rord(addDate),1,1000,1000)^2.5 /str Assuming you have an addDate field, this will give fresher document some boost. Look for this on the Wiki, it's all there. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: JLIST [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, May 13, 2008 5:42:38 AM Subject: the time factor Hi, I'm indexing news articles from a few news feeds. With news, there's the factor of relevance and also the factor of freshness. Relevance-only results are not satisfactory. Sorting on feed update time is not satisfactory, either, because one source may update more frequently than the others and it tends to occupy the first rows most of the time. I wonder what is the best way of combining the time factor in news search? Thanks, Jack
Re: Duplicates results when using a non optimized index
Hm, not sure why that is happening, but here is some info regarding other stuff from your email - there should be no duplicates even if you are searching an index that is being optimized - why are you searching an index that is being optimized? It's doable, but people typically perform index-modifying operations on a Solr master and read-only operations on Solr query slave(s) - do duplicates go away after optimization is done? - do duplicate IDs that you are seeing IDs of previously deleted documents? - which Solr version are you using and can you try a recent nightly? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Tim Mahy [EMAIL PROTECTED] To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, May 13, 2008 5:59:28 AM Subject: Duplicates results when using a non optimized index Hi all, is this expected behavior when having an index like this : numDocs : 9479963 maxDoc : 12622942 readerImpl : MultiReader which is in the process of optimizing that when we search through the index we get this : 15257559 15257559 17177888 11825631 11825631 The id field is declared like this : and is set as the unique identity like this in the schema xml : id so the question : is this expected behavior and if so is there a way to let Solr only return unique documents ? greetings and thanx in advance, Tim Please see our disclaimer, http://www.infosupport.be/Pages/Disclaimer.aspx