Re: Autocomplete(terms) middle of words
I've already use nutch trunk 4.0. I have problem with space. -- View this message in context: http://lucene.472066.n3.nabble.com/Autocomplete-terms-middle-of-words-tp2878694p2888940.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fq parameter with partial value
I'm a bit confused here. What is the difference between CATEGORY and CATEGORY_TOKENIZED if I just do a copyField from what field to another? And how can I search only for Restaurant (fq= CATEGORY_TOKENIZED: Restaurant). Shouldn't I have something like field name=CATEGORY_TOKENIZEDHotel/field, if I want this to work. And from what I understand, this means I should do more then just copy field name=*CATEGORY*Restaurant Hotel/field to CATEGORY_TOKENIZED. Thanks, Elisabeth 2011/4/28 Erick Erickson erickerick...@gmail.com See below: On Thu, Apr 28, 2011 at 9:03 AM, elisabeth benoit elisaelisael...@gmail.com wrote: yes, the multivalued field is not broken up into tokens. so, if I understand well what you mean, I could have a field CATEGORY with multiValued=true a field CATEGORY_TOKENIZED with multiValued= true and then some POI field name=NAMEPOI_Name/field ... field name=*CATEGORY*Restaurant Hotel/field field name=CATEGORY_TOKENIZEDRestaurant/field field name=CATEGORY_TOKENIZEDHotel/field [EOE] If the above is the document you're sending, then no. The document would be indexed with field name=*CATEGORY*Restaurant Hotel/field field name=CATEGORY_TOKENIZEDRestaurant Hotel/field Or even just: field name=*CATEGORY*Restaurant Hotel/field and set up a copyField to copy the value from CATEGORY to CATEGORY_TOKENIZED. The multiValued part comes from: And a single POIs might have different categories so your document could have which would look like: field name=CATEGORYRestaruant Hotel/field field name=CATEGORYHealth Spa/field field name=CATEGORYDance Hall/field and your document would be counted for each of those entries while searches against CATEGORY_TOKENIZED would match things like dance spa etc. But do notice that if you did NOT want searching for restaurant hall (no quotes), to match then you could do proximity searches for less than your increment gap. e.g. (this time with the quotes) would be restaurant hall~50, which would then NOT match if your increment gap were 100. Best Erick do faceting on CATEGORY and fq on CATEGORY_TOKENIZED. But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED? Best regards Elisabeth 2011/4/28 Erick Erickson erickerick...@gmail.com So, I assume your CATEGORY field is multiValued but each value is not broken up into tokens, right? If that's the case, would it work to have a second field CATEGORY_TOKENIZED and run your fq against that field instead? You could have this be a multiValued field with an increment gap if you wanted to prevent matches across separate entries and have your fq do a proximity search where the proximity was less than the increment gap Best Erick On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hi Stefan, Thanks for answering. In more details, my problem is the following. I'm working on searching points of interest (POIs), which can be hotels, restaurants, plumbers, psychologists, etc. Those POIs can be identified among other things by categories or by brand. And a single POIs might have different categories (no maximum number). User might enter a query like McDonald’s Paris or Restaurant Paris or many other possible queries First I want to do a facet search on brand and categories, to find out which case is the current case. http://localhost:8080/solr /select?q=restaurant paris facet=truefacet.field=BRAND facet.field=CATEGORY and get an answer like lst name=facet_fields lst name=CATEGORY int name=Restaurant598/int int name=Restaurant Hotel451/int Then I want to send a request with fq= CATEGORY: Restaurant and still get answers with CATEGORY= Restaurant Hotel. One solution would be to modify the data to add a new document every time we have a new category, so a POI with three different categories would be index three times, each time with a different category. But I was wondering if there was another way around. Thanks again, Elisabeth 2011/4/28 Stefan Matheis matheis.ste...@googlemail.com Hi Elisabeth, that's not what FilterQueries are made for :) What against using that Criteria in the Query? Perhaps you want to describe your UseCase and we'll see if there's another way to solve it? Regards Stefan On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I would like to know if there is a way to use the fq parameter with a partial value. For instance, if I have a request with fq=NAME:Joe, and I would like to retrieve all answers where NAME contains Joe, including those with NAME = Joe Smith. Thanks, Elisabeth
Re: Problems of deleting documents from Solr
Hi Jeff, You can delete either by unique id or by a query. It seems that you want to delete all documents having category of monitor. deletequerycat:monitor/query/delete http://wiki.apache.org/solr/UpdateXmlMessages#A.22delete.22_by_ID_and_by_Query - Original Message - From: Jeff Zhang zjf...@gmail.com To: solr-user@lucene.apache.org Cc: Sent: Monday, May 2, 2011 5:04 AM Subject: Problems of deleting documents from Solr Hi all, I want to update some document, so first I delete these document by invoking command java -Ddata=args -Dcommit=yes -jar post.jar deletecatmonitor/cat/delete The result is that I can not search the deleted documents but I can still see the terms of these document in http://localhost:8983/solr/admin/schema.jsp Even when I restart solr, it's still there, I notice that the *numDocs: * 0 while *maxDoc: * 1 Why's that ? How can I delete the documents correctly ? -- Best Regards Jeff Zhang
Re: solr sorting problem
Hi, Thanks for your reply. I'm, using commit=true while indexing, and it does index the records and show the number of records indexed. The problem is that search yields 0 records ( numFound=0 ). e.g. responselst name=responseHeaderint name=status0/intint name=QTime0/intlst name=paramsstr name=indenton/strstr name=qappl/str/lst/lstresult name=response numFound=0 start=0//response There are some entries for spell checking in my schema too. e.g. field name=f_spell_en type=textSpell_en/ copyField source=foodDescUS dest=f_spell_en/ The Search URL is something like:- http://localhost:8983/solr/select/?q=appleindent=on http://localhost:8983/solr/select/?q=appleversion=2.2start=0rows=10indent=on Cache could not be a problem as it did not fetch any records from the very begining. So, basically it does not fetch any documents/records whereas it does index them. Thanks Pratik -- View this message in context: http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2889075.html Sent from the Solr - User mailing list archive at Nabble.com.
OutOfMemoryError with DIH loading 140.000 documents
Hi, I receive OutOfMemoryError with Solr 3.1 when loading around 140.000 document with DIH. I have set the -Xmx to 1536M. Weird that I cannot give more heap memory. With -Xmx2G the process doesn't start. Do you know why can't I set 2G -Xmx to Solr 3.1? With Solr 4.0 trunk it I don't receive OutOfMemoryError issue, although I can set the -Xmx to 2G. Thank you. Cheers, Zoltan
Bulk update via filter query
Is there an efficient way to update multiple documents with common values (e.g. color = white)? An example would be to mark all white-colored items as sold-out. - Rih
Re: Bulk update via filter query
Is there an efficient way to update multiple documents with common values (e.g. color = white)? An example would be to mark all white-colored items as sold-out. http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html can be an option.
Re: solr sorting problem
Hi, Were you able to sort the results using alphaOnlySort ? If yes what changes were made to the schema and data-config ? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2889473.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fq parameter with partial value
See more below :). On Mon, May 2, 2011 at 2:43 AM, elisabeth benoit elisaelisael...@gmail.com wrote: I'm a bit confused here. What is the difference between CATEGORY and CATEGORY_TOKENIZED if I just do a copyField from what field to another? [EOE] Copyfield is done with the original data, not the processed data. So it's as though you added both fields in the input document. And how can I search only for Restaurant (fq= CATEGORY_TOKENIZED: Restaurant). Shouldn't I have something like field name=CATEGORY_TOKENIZEDHotel/field, if I want this to work. [EOE] that's what copyfield does for you. And from what I understand, this means I should do more then just copy field name=*CATEGORY*Restaurant Hotel/field to CATEGORY_TOKENIZED. [EOE] Don't understand your question. Here's what I'd suggest. Just try it. Then use the admin page to look at your fields to see what the indexed values are. Also, try using the admin page to run some test queries with debugging on, I always get more out of a few experiments than I do out of documentation... Best Erick Thanks, Elisabeth 2011/4/28 Erick Erickson erickerick...@gmail.com See below: On Thu, Apr 28, 2011 at 9:03 AM, elisabeth benoit elisaelisael...@gmail.com wrote: yes, the multivalued field is not broken up into tokens. so, if I understand well what you mean, I could have a field CATEGORY with multiValued=true a field CATEGORY_TOKENIZED with multiValued= true and then some POI field name=NAMEPOI_Name/field ... field name=*CATEGORY*Restaurant Hotel/field field name=CATEGORY_TOKENIZEDRestaurant/field field name=CATEGORY_TOKENIZEDHotel/field [EOE] If the above is the document you're sending, then no. The document would be indexed with field name=*CATEGORY*Restaurant Hotel/field field name=CATEGORY_TOKENIZEDRestaurant Hotel/field Or even just: field name=*CATEGORY*Restaurant Hotel/field and set up a copyField to copy the value from CATEGORY to CATEGORY_TOKENIZED. The multiValued part comes from: And a single POIs might have different categories so your document could have which would look like: field name=CATEGORYRestaruant Hotel/field field name=CATEGORYHealth Spa/field field name=CATEGORYDance Hall/field and your document would be counted for each of those entries while searches against CATEGORY_TOKENIZED would match things like dance spa etc. But do notice that if you did NOT want searching for restaurant hall (no quotes), to match then you could do proximity searches for less than your increment gap. e.g. (this time with the quotes) would be restaurant hall~50, which would then NOT match if your increment gap were 100. Best Erick do faceting on CATEGORY and fq on CATEGORY_TOKENIZED. But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED? Best regards Elisabeth 2011/4/28 Erick Erickson erickerick...@gmail.com So, I assume your CATEGORY field is multiValued but each value is not broken up into tokens, right? If that's the case, would it work to have a second field CATEGORY_TOKENIZED and run your fq against that field instead? You could have this be a multiValued field with an increment gap if you wanted to prevent matches across separate entries and have your fq do a proximity search where the proximity was less than the increment gap Best Erick On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hi Stefan, Thanks for answering. In more details, my problem is the following. I'm working on searching points of interest (POIs), which can be hotels, restaurants, plumbers, psychologists, etc. Those POIs can be identified among other things by categories or by brand. And a single POIs might have different categories (no maximum number). User might enter a query like McDonald’s Paris or Restaurant Paris or many other possible queries First I want to do a facet search on brand and categories, to find out which case is the current case. http://localhost:8080/solr /select?q=restaurant paris facet=truefacet.field=BRAND facet.field=CATEGORY and get an answer like lst name=facet_fields lst name=CATEGORY int name=Restaurant598/int int name=Restaurant Hotel451/int Then I want to send a request with fq= CATEGORY: Restaurant and still get answers with CATEGORY= Restaurant Hotel. One solution would be to modify the data to add a new document every time we have a new category, so a POI with three different categories would be index three times, each time with a different category. But I was wondering if there was another way around. Thanks again, Elisabeth 2011/4/28 Stefan Matheis matheis.ste...@googlemail.com Hi Elisabeth, that's not what FilterQueries are made for :) What
Re: OutOfMemoryError with DIH loading 140.000 documents
What do you have your commit parameters set to in solrconfig.xml? I suspect you can make this all work by reducing the ram threshold in the config file Best Erick On Mon, May 2, 2011 at 4:55 AM, Zoltán Altfatter altfatt...@gmail.com wrote: Hi, I receive OutOfMemoryError with Solr 3.1 when loading around 140.000 document with DIH. I have set the -Xmx to 1536M. Weird that I cannot give more heap memory. With -Xmx2G the process doesn't start. Do you know why can't I set 2G -Xmx to Solr 3.1? With Solr 4.0 trunk it I don't receive OutOfMemoryError issue, although I can set the -Xmx to 2G. Thank you. Cheers, Zoltan
Re: solr sorting problem
Your query is going against the default field (defined in schema.xml). Have you tried a fielded search? And it would best to start a new thread for new questions see: http://people.apache.org/~hossman/#threadhijack Best Erick On Mon, May 2, 2011 at 3:46 AM, Pratik pratik_dwiv...@yahoo.com wrote: Hi, Thanks for your reply. I'm, using commit=true while indexing, and it does index the records and show the number of records indexed. The problem is that search yields 0 records ( numFound=0 ). e.g. responselst name=responseHeaderint name=status0/intint name=QTime0/intlst name=paramsstr name=indenton/strstr name=qappl/str/lst/lstresult name=response numFound=0 start=0//response There are some entries for spell checking in my schema too. e.g. field name=f_spell_en type=textSpell_en/ copyField source=foodDescUS dest=f_spell_en/ The Search URL is something like:- http://localhost:8983/solr/select/?q=appleindent=on http://localhost:8983/solr/select/?q=appleversion=2.2start=0rows=10indent=on Cache could not be a problem as it did not fetch any records from the very begining. So, basically it does not fetch any documents/records whereas it does index them. Thanks Pratik -- View this message in context: http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2889075.html Sent from the Solr - User mailing list archive at Nabble.com.
updates not reflected in solr admin
This is in 1.4 - we push updates via SolrJ; our application sees the updates, but when we use the solr admin screens to run test queries, or use Luke to view the schema and field values, it sees the database in its state prior to the commit. I think eventually this seems to propagate, but I'm not clear how often since we generally restart the (tomcat) server in order to get the new commit to be visible. I saw a comment recently (from Lance) that there is (annoying) HTTP caching enabled by default in solrconfig.xml. Does this sound like something that would be caused by that cache? If so, I'd probably want to disable it. Does that affect performance of queries run via SolrJ? Also: why isn't that cache flushed by a commit? Seems weird... -- Michael Sokolov Engineering Director www.ifactory.com
Negative boost
Hi all, I understand that the only way to simulate a negative boost is to positively boost the inverse. I have looked at http://wiki.apache.org/solr/SolrRelevancyFAQ but I think I am missing something on the formatting of my query. I am using: http://localhost:8983/solr/search?q=dogbq=(*:* -species:Sheltie)^1 In this case, I am trying to search for records about dog but to put records containing Sheltie closer to the bottom as I am not really interested in that. However, the following queries: http://localhost:8983/solr/search?q=dog http://localhost:8983/solr/search?q=dogbq=(*:* -species:Sheltie)^1 Return the exact same set of results with a record about a Sheltie as the top result each time. What am I doing incorrectly? Thanks, Brian Lamb
Re: Highlighting words with non-ascii chars
Does your servlet container have the URI encoding set correctly, e.g. URIEncoding=UTF-8 for tomcat6? http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config Older versions of Jetty use ISO-8859-1 as the default URI encoding, but jetty 6 should use UTF-8 as default: http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings -Peter On Sat, Apr 30, 2011 at 6:31 AM, Pavel Kukačka pavel.kuka...@seznam.cz wrote: Hello, I've hit a (probably trivial) roadblock I don't know how to overcome with Solr 3.1: I have a document with common fields (title, keywords, content) and I'm trying to use highlighting. With queries using ASCII characters there is no problem; it works smoothly. However, when I search using a czech word including non-ascii chars (like slovíčko for example - http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dkoversion=2.2start=0rows=10indent=onhl=onhl.fl=*), the document is found, but the response doesn't contain the highlighted snippet in the highlighting node - there is only an empty node - like this: ** . . . lst name=highlighting lst name=2009/ /lst When searching for the other keyword ( http://localhost:8983/solr/select/?q=slovoversion=2.2start=0rows=10indent=onhl=onhl.fl=*), the resulting response is fine - like this: lst name=highlighting lst name=2009 arr name=user_keywords strslovamp;#237;amp;#269;ko lt;em id=highlightinggt;slovolt;/emgt;/str /arr /lst /lst Did anyone come accross this problem? Cheers, Pavel -- Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com : 978-296-5247 Get a free, hosted Drupal 7 site: http://www.drupalgardens.com;
Re: updates not reflected in solr admin
This is in 1.4 - we push updates via SolrJ; our application sees the updates, but when we use the solr admin screens to run test queries, or use Luke to view the schema and field values, it sees the database in its state prior to the commit. I think eventually this seems to propagate, but I'm not clear how often since we generally restart the (tomcat) server in order to get the new commit to be visible. You need to issue a commit from HTTP interface to see the changes made by embedded solr server. solr/update?commit=true
Re: Negative boost
I understand that the only way to simulate a negative boost is to positively boost the inverse. I have looked at http://wiki.apache.org/solr/SolrRelevancyFAQ but I think I am missing something on the formatting of my query. I am using: http://localhost:8983/solr/search?q=dogbq=(*:* -species:Sheltie)^1 In this case, I am trying to search for records about dog but to put records containing Sheltie closer to the bottom as I am not really interested in that. However, the following queries: http://localhost:8983/solr/search?q=dog http://localhost:8983/solr/search?q=dogbq=(*:* -species:Sheltie)^1 Return the exact same set of results with a record about a Sheltie as the top result each time. What am I doing incorrectly? bq parameter is specific to dismax query parser. If you want to benefit from bq, you need to use defType=dismax as well as other dismax's parameters http://wiki.apache.org/solr/DisMaxQParserPlugin
Re: updates not reflected in solr admin
Thanks - we are issuing a commit via SolrJ; I think that's the same thing, right? Or are you saying really we need to do a separate commit (via HTTP) to update the admin console's view? -Mike On 05/02/2011 11:49 AM, Ahmet Arslan wrote: This is in 1.4 - we push updates via SolrJ; our application sees the updates, but when we use the solr admin screens to run test queries, or use Luke to view the schema and field values, it sees the database in its state prior to the commit. I think eventually this seems to propagate, but I'm not clear how often since we generally restart the (tomcat) server in order to get the new commit to be visible. You need to issue a commit from HTTP interface to see the changes made by embedded solr server. solr/update?commit=true
Re: updates not reflected in solr admin
Thanks - we are issuing a commit via SolrJ; I think that's the same thing, right? Or are you saying really we need to do a separate commit (via HTTP) to update the admin console's view? Yes separate commit is needed.
RE: querying in Java
This worked. Thank you. What if I want to query for two or more field's values. For example: Field color dayOf Week Value blue Tuesday I have tried a query string of blueTuesday, with no success. -Original Message- From: Anuj Kumar [mailto:anujs...@gmail.com] Sent: Friday, April 29, 2011 2:10 PM To: solr-user@lucene.apache.org Subject: Re: querying in Java Hi Jeff, In that case, you can create a new index field (set indexed to true and stored to false) and copy all your fields to it using copyField. Also make this new field as your default search field. This will handle your case. Regards, Anuj On Fri, Apr 29, 2011 at 11:36 PM, Saler, Jeff jsa...@ball.com wrote: Thanks for the reply. What I want is for the query to search all fields for the specified value. -Original Message- From: Anuj Kumar [mailto:anujs...@gmail.com] Sent: Friday, April 29, 2011 1:51 PM To: solr-user@lucene.apache.org Subject: Re: querying in Java Hi Jeff, In that case, it will query w.r.t default field. What is your default search field in the schema? Regards, Anuj On Fri, Apr 29, 2011 at 11:10 PM, Saler, Jeff jsa...@ball.com wrote: Is there any way to query for data that is in any field, i.e. not using a specific field name? For example, when I use the following statements: SolrQuery query = new SolrQuery(); Query.setQuery(ANALYST:John Schummers); QueryResponse rsp = server.query(query); I get the documents I'm looking for. But I would like to get the same set of documents without using the specific ANALYST field name. I have tried using just Schummers as the query, but no documents are returned. The ANALYST field is an indexed field. This message and any enclosures are intended only for the addressee. Please notify the sender by email if you are not the intended recipient. If you are not the intended recipient, you may not use, copy, disclose, or distribute this message or its contents or enclosures to any other person and any such actions may be unlawful. Ball reserves the right to monitor and review all messages and enclosures sent to or from this email address. This message and any enclosures are intended only for the addressee. Please notify the sender by email if you are not the intended recipient. If you are not the intended recipient, you may not use, copy, disclose, or distribute this message or its contents or enclosures to any other person and any such actions may be unlawful. Ball reserves the right to monitor and review all messages and enclosures sent to or from this email address. This message and any enclosures are intended only for the addressee. Please notify the sender by email if you are not the intended recipient. If you are not the intended recipient, you may not use, copy, disclose, or distribute this message or its contents or enclosures to any other person and any such actions may be unlawful. Ball reserves the right to monitor and review all messages and enclosures sent to or from this email address.
Question concerning the updating of my solr index
Hello all, I have integrated Solr into my project with success. I use a dataimporthandler to first import the data mapping the fields to my schema.xml. I use Solrj to query the data and also use faceting. Works great. The question I have now is a general one on updating the index and how it works. Right now, I have a thread which runs a couple of times a day to update the index. My index is composed of about 2 documents, and when this thread is run it takes the data of the 2 documents in the db, I create a solrdocument for each and I then use this line of code to index the index. SolrServer server = new CommonsHttpSolrServer(http://localhost:8080/apache-solr-1.4.1/;); CollectionSolrInputDocument docs = new ArrayListSolrInputDocument(); for (Iterator iterator = documents.iterator(); iterator.hasNext();) { Document document = (Document) iterator.next(); SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document); docs.add(solrDoc); } UpdateRequest req = new UpdateRequest(); req.setAction(UpdateRequest.ACTION.COMMIT, false, false); req.add(docs); UpdateResponse rsp = req.process(server); server.optimize(); This process takes 19 seconds, which is 10 seconds faster than my older solution using compass (another opensource search project we used). Is this the best was to update the index? If I understand correctly, an update is actually a delete in the index then an add. During the 19 seconds, will my index be locked only on the document being updated or the whole index could be locked? I am not in production yet with this solution, so I want to make sure my update process makes sense. Thanks Greg
Re: Question concerning the updating of my solr index
Greg, You could use StreamingUpdateSolrServer instead of that UpdateRequest class - http://search-lucene.com/?q=StreamingUpdateSolrServer+fc_project=Solr Your index won't be locked in the sense that you could have multiple apps or threads adding docs to the same index simultaneously and that searches can be executed against the index concurrently. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Greg Georges greg.geor...@biztree.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Mon, May 2, 2011 1:33:30 PM Subject: Question concerning the updating of my solr index Hello all, I have integrated Solr into my project with success. I use a dataimporthandler to first import the data mapping the fields to my schema.xml. I use Solrj to query the data and also use faceting. Works great. The question I have now is a general one on updating the index and how it works. Right now, I have a thread which runs a couple of times a day to update the index. My index is composed of about 2 documents, and when this thread is run it takes the data of the 2 documents in the db, I create a solrdocument for each and I then use this line of code to index the index. SolrServer server = new CommonsHttpSolrServer(http://localhost:8080/apache-solr-1.4.1/;); CollectionSolrInputDocument docs = new ArrayListSolrInputDocument(); for (Iterator iterator = documents.iterator(); iterator.hasNext();) { Document document = (Document) iterator.next(); SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document); docs.add(solrDoc); } UpdateRequest req = new UpdateRequest(); req.setAction(UpdateRequest.ACTION.COMMIT, false, false); req.add(docs); UpdateResponse rsp = req.process(server); server.optimize(); This process takes 19 seconds, which is 10 seconds faster than my older solution using compass (another opensource search project we used). Is this the best was to update the index? If I understand correctly, an update is actually a delete in the index then an add. During the 19 seconds, will my index be locked only on the document being updated or the whole index could be locked? I am not in production yet with this solution, so I want to make sure my update process makes sense. Thanks Greg
Re: OutOfMemoryError with DIH loading 140.000 documents
Zoltan - Solr is not preventing you from giving your JVM 2GB heap, something else is. If you paste the error we may be able to help. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Zoltán Altfatter altfatt...@gmail.com To: solr-user@lucene.apache.org Sent: Mon, May 2, 2011 4:55:06 AM Subject: OutOfMemoryError with DIH loading 140.000 documents Hi, I receive OutOfMemoryError with Solr 3.1 when loading around 140.000 document with DIH. I have set the -Xmx to 1536M. Weird that I cannot give more heap memory. With -Xmx2G the process doesn't start. Do you know why can't I set 2G -Xmx to Solr 3.1? With Solr 4.0 trunk it I don't receive OutOfMemoryError issue, although I can set the -Xmx to 2G. Thank you. Cheers, Zoltan
Re: querying in Java
Jeff, If I understand what you need, then: yourFieldNameHere:(blue OR Tuesday) Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Saler, Jeff jsa...@ball.com To: solr-user@lucene.apache.org Sent: Mon, May 2, 2011 1:24:52 PM Subject: RE: querying in Java This worked. Thank you. What if I want to query for two or more field's values. For example: Field color dayOf Week Value blue Tuesday I have tried a query string of blueTuesday, with no success. -Original Message- From: Anuj Kumar [mailto:anujs...@gmail.com] Sent: Friday, April 29, 2011 2:10 PM To: solr-user@lucene.apache.org Subject: Re: querying in Java Hi Jeff, In that case, you can create a new index field (set indexed to true and stored to false) and copy all your fields to it using copyField. Also make this new field as your default search field. This will handle your case. Regards, Anuj On Fri, Apr 29, 2011 at 11:36 PM, Saler, Jeff jsa...@ball.com wrote: Thanks for the reply. What I want is for the query to search all fields for the specified value. -Original Message- From: Anuj Kumar [mailto:anujs...@gmail.com] Sent: Friday, April 29, 2011 1:51 PM To: solr-user@lucene.apache.org Subject: Re: querying in Java Hi Jeff, In that case, it will query w.r.t default field. What is your default search field in the schema? Regards, Anuj On Fri, Apr 29, 2011 at 11:10 PM, Saler, Jeff jsa...@ball.com wrote: Is there any way to query for data that is in any field, i.e. not using a specific field name? For example, when I use the following statements: SolrQuery query = new SolrQuery(); Query.setQuery(ANALYST:John Schummers); QueryResponse rsp = server.query(query); I get the documents I'm looking for. But I would like to get the same set of documents without using the specific ANALYST field name. I have tried using just Schummers as the query, but no documents are returned. The ANALYST field is an indexed field. This message and any enclosures are intended only for the addressee. Please notify the sender by email if you are not the intended recipient. If you are not the intended recipient, you may not use, copy, disclose, or distribute this message or its contents or enclosures to any other person and any such actions may be unlawful. Ball reserves the right to monitor and review all messages and enclosures sent to or from this email address. This message and any enclosures are intended only for the addressee. Please notify the sender by email if you are not the intended recipient. If you are not the intended recipient, you may not use, copy, disclose, or distribute this message or its contents or enclosures to any other person and any such actions may be unlawful. Ball reserves the right to monitor and review all messages and enclosures sent to or from this email address. This message and any enclosures are intended only for the addressee. Please notify the sender by email if you are not the intended recipient. If you are not the intended recipient, you may not use, copy, disclose, or distribute this message or its contents or enclosures to any other person and any such actions may be unlawful. Ball reserves the right to monitor and review all messages and enclosures sent to or from this email address.
Re: querying in Java
Hi Jeff, Either you can use a filter query or specify it explicitly, like- Field:Value OR color:blue OR dayOfWeek:Tuesday or use AND in between. It depends on what you want. Also, if you don't want to specify AND/OR and decide on a global declaration, then set it as the default operator in your schema.xml. For example- solrQueryParser defaultOperator=OR/ Hope it helps. Regards, Anuj On Mon, May 2, 2011 at 10:54 PM, Saler, Jeff jsa...@ball.com wrote: This worked. Thank you. What if I want to query for two or more field's values. For example: Field color dayOf Week Value blue Tuesday I have tried a query string of blueTuesday, with no success. -Original Message- From: Anuj Kumar [mailto:anujs...@gmail.com] Sent: Friday, April 29, 2011 2:10 PM To: solr-user@lucene.apache.org Subject: Re: querying in Java Hi Jeff, In that case, you can create a new index field (set indexed to true and stored to false) and copy all your fields to it using copyField. Also make this new field as your default search field. This will handle your case. Regards, Anuj On Fri, Apr 29, 2011 at 11:36 PM, Saler, Jeff jsa...@ball.com wrote: Thanks for the reply. What I want is for the query to search all fields for the specified value. -Original Message- From: Anuj Kumar [mailto:anujs...@gmail.com] Sent: Friday, April 29, 2011 1:51 PM To: solr-user@lucene.apache.org Subject: Re: querying in Java Hi Jeff, In that case, it will query w.r.t default field. What is your default search field in the schema? Regards, Anuj On Fri, Apr 29, 2011 at 11:10 PM, Saler, Jeff jsa...@ball.com wrote: Is there any way to query for data that is in any field, i.e. not using a specific field name? For example, when I use the following statements: SolrQuery query = new SolrQuery(); Query.setQuery(ANALYST:John Schummers); QueryResponse rsp = server.query(query); I get the documents I'm looking for. But I would like to get the same set of documents without using the specific ANALYST field name. I have tried using just Schummers as the query, but no documents are returned. The ANALYST field is an indexed field. This message and any enclosures are intended only for the addressee. Please notify the sender by email if you are not the intended recipient. If you are not the intended recipient, you may not use, copy, disclose, or distribute this message or its contents or enclosures to any other person and any such actions may be unlawful. Ball reserves the right to monitor and review all messages and enclosures sent to or from this email address. This message and any enclosures are intended only for the addressee. Please notify the sender by email if you are not the intended recipient. If you are not the intended recipient, you may not use, copy, disclose, or distribute this message or its contents or enclosures to any other person and any such actions may be unlawful. Ball reserves the right to monitor and review all messages and enclosures sent to or from this email address. This message and any enclosures are intended only for the addressee. Please notify the sender by email if you are not the intended recipient. If you are not the intended recipient, you may not use, copy, disclose, or distribute this message or its contents or enclosures to any other person and any such actions may be unlawful. Ball reserves the right to monitor and review all messages and enclosures sent to or from this email address.
Re: updates not reflected in solr admin
Ah - I didn't expect that. Thank you! On 05/02/2011 12:07 PM, Ahmet Arslan wrote: Thanks - we are issuing a commit via SolrJ; I think that's the same thing, right? Or are you saying really we need to do a separate commit (via HTTP) to update the admin console's view? Yes separate commit is needed.
Re: How to combine Deduplication and Elevation
: Hi I have a question. How to combine the Deduplication and Elevation : implementations in Solr. Currently , I managed to implement either one only. can you elaborate a bit more on what exactly you've tried and what problem you are facing? the SignatureUpdateProcessorFactory (which is used for Deduplication) and the QueryElevation component should work just fine together -- in fact: one is used at index time and hte ohter at query time, so where shouldn't be any interaction at all... http://wiki.apache.org/solr/Deduplication http://wiki.apache.org/solr/QueryElevationComponent -Hoss
Re: Too many open files exception related to solrj getServer too often?
Off the top of my head, i don't know hte answers to some of your questions, but as to the core cause of the exception... : 3. server.query(solrQuery) throws SolrServerException. How can concurrent : solr queries triggers Too many open file exception? ...bear in mind that (as i understand it) the limit on open files is actually a limit on open file *descriptors* which includes network sockets. a google search for java.net.SocketException: Too many open files will give you loads of results -- it's not specific to solr. -Hoss
RE: Question concerning the updating of my solr index
Ok I had seen this in the wiki, performance has gone from 19 seconds to 13. I have configured it like this, I wonder what would the best settings be with 20,000 docs to update? Higher or lower queue value? Higher or lower thread value? Thanks Greg -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: 2 mai 2011 13:59 To: solr-user@lucene.apache.org Subject: Re: Question concerning the updating of my solr index Greg, You could use StreamingUpdateSolrServer instead of that UpdateRequest class - http://search-lucene.com/?q=StreamingUpdateSolrServer+fc_project=Solr Your index won't be locked in the sense that you could have multiple apps or threads adding docs to the same index simultaneously and that searches can be executed against the index concurrently. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Greg Georges greg.geor...@biztree.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Mon, May 2, 2011 1:33:30 PM Subject: Question concerning the updating of my solr index Hello all, I have integrated Solr into my project with success. I use a dataimporthandler to first import the data mapping the fields to my schema.xml. I use Solrj to query the data and also use faceting. Works great. The question I have now is a general one on updating the index and how it works. Right now, I have a thread which runs a couple of times a day to update the index. My index is composed of about 2 documents, and when this thread is run it takes the data of the 2 documents in the db, I create a solrdocument for each and I then use this line of code to index the index. SolrServer server = new CommonsHttpSolrServer(http://localhost:8080/apache-solr-1.4.1/;); CollectionSolrInputDocument docs = new ArrayListSolrInputDocument(); for (Iterator iterator = documents.iterator(); iterator.hasNext();) { Document document = (Document) iterator.next(); SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document); docs.add(solrDoc); } UpdateRequest req = new UpdateRequest(); req.setAction(UpdateRequest.ACTION.COMMIT, false, false); req.add(docs); UpdateResponse rsp = req.process(server); server.optimize(); This process takes 19 seconds, which is 10 seconds faster than my older solution using compass (another opensource search project we used). Is this the best was to update the index? If I understand correctly, an update is actually a delete in the index then an add. During the 19 seconds, will my index be locked only on the document being updated or the whole index could be locked? I am not in production yet with this solution, so I want to make sure my update process makes sense. Thanks Greg
RE: Question concerning the updating of my solr index
Oops, here is the code SolrServer server = new StreamingUpdateSolrServer(http://localhost:8080/apache-solr-1.4.1/;, 1000, 4); CollectionSolrInputDocument docs = new ArrayListSolrInputDocument(); for (Iterator iterator = documents.iterator(); iterator.hasNext();) { Document document = (Document) iterator.next(); SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document); docs.add(solrDoc); } server.add(docs); server.commit(); server.optimize(); Greg -Original Message- From: Greg Georges [mailto:greg.geor...@biztree.com] Sent: 2 mai 2011 14:44 To: solr-user@lucene.apache.org Subject: RE: Question concerning the updating of my solr index Ok I had seen this in the wiki, performance has gone from 19 seconds to 13. I have configured it like this, I wonder what would the best settings be with 20,000 docs to update? Higher or lower queue value? Higher or lower thread value? Thanks Greg -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: 2 mai 2011 13:59 To: solr-user@lucene.apache.org Subject: Re: Question concerning the updating of my solr index Greg, You could use StreamingUpdateSolrServer instead of that UpdateRequest class - http://search-lucene.com/?q=StreamingUpdateSolrServer+fc_project=Solr Your index won't be locked in the sense that you could have multiple apps or threads adding docs to the same index simultaneously and that searches can be executed against the index concurrently. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Greg Georges greg.geor...@biztree.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Mon, May 2, 2011 1:33:30 PM Subject: Question concerning the updating of my solr index Hello all, I have integrated Solr into my project with success. I use a dataimporthandler to first import the data mapping the fields to my schema.xml. I use Solrj to query the data and also use faceting. Works great. The question I have now is a general one on updating the index and how it works. Right now, I have a thread which runs a couple of times a day to update the index. My index is composed of about 2 documents, and when this thread is run it takes the data of the 2 documents in the db, I create a solrdocument for each and I then use this line of code to index the index. SolrServer server = new CommonsHttpSolrServer(http://localhost:8080/apache-solr-1.4.1/;); CollectionSolrInputDocument docs = new ArrayListSolrInputDocument(); for (Iterator iterator = documents.iterator(); iterator.hasNext();) { Document document = (Document) iterator.next(); SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document); docs.add(solrDoc); } UpdateRequest req = new UpdateRequest(); req.setAction(UpdateRequest.ACTION.COMMIT, false, false); req.add(docs); UpdateResponse rsp = req.process(server); server.optimize(); This process takes 19 seconds, which is 10 seconds faster than my older solution using compass (another opensource search project we used). Is this the best was to update the index? If I understand correctly, an update is actually a delete in the index then an add. During the 19 seconds, will my index be locked only on the document being updated or the whole index could be locked? I am not in production yet with this solution, so I want to make sure my update process makes sense. Thanks Greg
Re: when to change rows param?
: I thought that injecting the rows param in the query-component would : have been enough (from the limits param my client is giving). But it : seems not to be the case. As i tried to explain before: the details matter. exactly where in the code you tried to do this and how you went about it is important to understanding why it might not have affected the results in the way you expect. SearchComponents are ordered, and multiple passes are made over each component in order, and each component has the opportunity to access the request params in a variety of ways, etc... So w/o knowing exactly what you changed, we can't really speculate why some other code isn't using the new value (particularly since i don't think you ever actaully told use *which* other code isn't getting the new value) : : paul : : : Le 12 avr. 2011 à 02:07, Chris Hostetter a écrit : : : : Paul: can you elaborate a little bit on what exactly your problem is? : : - what is the full component list you are using? : - how are you changing the param value (ie: what does the code look like) : - what isn't working the way you expect? : : : I've been using my own QueryComponent (that extends the search one) : : successfully to rewrite web-received parameters that are sent from the : : (ExtJS-based) javascript client. This allows an amount of : : query-rewriting, that's good. I tried to change the rows parameter there : : (which is limit in the query, as per the underpinnings of ExtJS) but : : it seems that this is not enough. : : : : Which component should I subclass to change the rows parameter? : : -Hoss : : -Hoss
Re: Question concerning the updating of my solr index
Greg, I believe the point of SUSS is that you can just add docs to it one by one, so that SUSS can asynchronously send them to the backend Solr instead of you batching the docs. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Greg Georges greg.geor...@biztree.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Mon, May 2, 2011 2:45:40 PM Subject: RE: Question concerning the updating of my solr index Oops, here is the code SolrServer server = new StreamingUpdateSolrServer(http://localhost:8080/apache-solr-1.4.1/;, 1000, 4); CollectionSolrInputDocument docs = new ArrayListSolrInputDocument(); for (Iterator iterator = documents.iterator(); iterator.hasNext();) { Document document = (Document) iterator.next(); SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document); docs.add(solrDoc); } server.add(docs); server.commit(); server.optimize(); Greg -Original Message- From: Greg Georges [mailto:greg.geor...@biztree.com] Sent: 2 mai 2011 14:44 To: solr-user@lucene.apache.org Subject: RE: Question concerning the updating of my solr index Ok I had seen this in the wiki, performance has gone from 19 seconds to 13. I have configured it like this, I wonder what would the best settings be with 20,000 docs to update? Higher or lower queue value? Higher or lower thread value? Thanks Greg -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: 2 mai 2011 13:59 To: solr-user@lucene.apache.org Subject: Re: Question concerning the updating of my solr index Greg, You could use StreamingUpdateSolrServer instead of that UpdateRequest class - http://search-lucene.com/?q=StreamingUpdateSolrServer+fc_project=Solr Your index won't be locked in the sense that you could have multiple apps or threads adding docs to the same index simultaneously and that searches can be executed against the index concurrently. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Greg Georges greg.geor...@biztree.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Mon, May 2, 2011 1:33:30 PM Subject: Question concerning the updating of my solr index Hello all, I have integrated Solr into my project with success. I use a dataimporthandler to first import the data mapping the fields to my schema.xml. I use Solrj to query the data and also use faceting. Works great. The question I have now is a general one on updating the index and how it works. Right now, I have a thread which runs a couple of times a day to update the index. My index is composed of about 2 documents, and when this thread is run it takes the data of the 2 documents in the db, I create a solrdocument for each and I then use this line of code to index the index. SolrServer server = new CommonsHttpSolrServer(http://localhost:8080/apache-solr-1.4.1/;); CollectionSolrInputDocument docs = new ArrayListSolrInputDocument(); for (Iterator iterator = documents.iterator(); iterator.hasNext();) { Document document = (Document) iterator.next(); SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document); docs.add(solrDoc); } UpdateRequest req = new UpdateRequest(); req.setAction(UpdateRequest.ACTION.COMMIT, false, false); req.add(docs); UpdateResponse rsp = req.process(server); server.optimize(); This process takes 19 seconds, which is 10 seconds faster than my older solution using compass (another opensource search project we used). Is this the best was to update the index? If I understand correctly, an update is actually a delete in the index then an add. During the 19 seconds, will my index be locked only on the document being updated or the whole index could be locked? I am not in production yet with this solution, so I want to make sure my update process makes sense. Thanks Greg
Re: Why are they different?
: The above code, if I start the server(tomcat) inside eclipse, it throws : SolrException : Internal Server Error; but if I start the server : outside eclipse, for instance, run startup.bat in tomcat's bin : directory, it runs successfully. I really don't understand Why they are : different. Have you checked the logs? My guess is that when you use eclipse, the server is not starting up properly at all. Possible not finding the Solr Home directory? -Hoss
RE: Question concerning the updating of my solr index
Yeah you are right, I have changed that to add a document and not a list of documents. Still works pretty fast, I will continue to test settings to see if I can tweak it further. Thanks Greg -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: 2 mai 2011 14:56 To: solr-user@lucene.apache.org Subject: Re: Question concerning the updating of my solr index Greg, I believe the point of SUSS is that you can just add docs to it one by one, so that SUSS can asynchronously send them to the backend Solr instead of you batching the docs. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Greg Georges greg.geor...@biztree.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Mon, May 2, 2011 2:45:40 PM Subject: RE: Question concerning the updating of my solr index Oops, here is the code SolrServer server = new StreamingUpdateSolrServer(http://localhost:8080/apache-solr-1.4.1/;, 1000, 4); CollectionSolrInputDocument docs = new ArrayListSolrInputDocument(); for (Iterator iterator = documents.iterator(); iterator.hasNext();) { Document document = (Document) iterator.next(); SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document); docs.add(solrDoc); } server.add(docs); server.commit(); server.optimize(); Greg -Original Message- From: Greg Georges [mailto:greg.geor...@biztree.com] Sent: 2 mai 2011 14:44 To: solr-user@lucene.apache.org Subject: RE: Question concerning the updating of my solr index Ok I had seen this in the wiki, performance has gone from 19 seconds to 13. I have configured it like this, I wonder what would the best settings be with 20,000 docs to update? Higher or lower queue value? Higher or lower thread value? Thanks Greg -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: 2 mai 2011 13:59 To: solr-user@lucene.apache.org Subject: Re: Question concerning the updating of my solr index Greg, You could use StreamingUpdateSolrServer instead of that UpdateRequest class - http://search-lucene.com/?q=StreamingUpdateSolrServer+fc_project=Solr Your index won't be locked in the sense that you could have multiple apps or threads adding docs to the same index simultaneously and that searches can be executed against the index concurrently. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Greg Georges greg.geor...@biztree.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Mon, May 2, 2011 1:33:30 PM Subject: Question concerning the updating of my solr index Hello all, I have integrated Solr into my project with success. I use a dataimporthandler to first import the data mapping the fields to my schema.xml. I use Solrj to query the data and also use faceting. Works great. The question I have now is a general one on updating the index and how it works. Right now, I have a thread which runs a couple of times a day to update the index. My index is composed of about 2 documents, and when this thread is run it takes the data of the 2 documents in the db, I create a solrdocument for each and I then use this line of code to index the index. SolrServer server = new CommonsHttpSolrServer(http://localhost:8080/apache-solr-1.4.1/;); CollectionSolrInputDocument docs = new ArrayListSolrInputDocument(); for (Iterator iterator = documents.iterator(); iterator.hasNext();) { Document document = (Document) iterator.next(); SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document); docs.add(solrDoc); } UpdateRequest req = new UpdateRequest(); req.setAction(UpdateRequest.ACTION.COMMIT, false, false); req.add(docs); UpdateResponse rsp = req.process(server); server.optimize(); This process takes 19 seconds, which is 10 seconds faster than my older solution using compass (another opensource search project we used). Is this the best was to update the index? If I understand correctly, an update is actually a delete in the index then an add. During the 19 seconds, will my index be locked only on the document being updated or the whole index could be locked? I am not in production yet with this solution, so I want to make sure my update process makes sense. Thanks Greg
DataImportHandler on 2 tables
Hello all, I have a system where I have a dataimporthandler defined for one table in my database. I need to also index data from another table, so therefore I will need another index to search on. Does this mean I must configure another solr instance (another schema.xml file, dataimporthandler sql file, etc)? Do I need another solr core for this? Thanks Greg
Re: DataImportHandler on 2 tables
Do you want to search on the datas from the tables together or seperately ? Is there a join between the two tables ? Ludovic. 2011/5/2 Greg Georges [via Lucene] ml-node+2891256-222073995-383...@n3.nabble.com Hello all, I have a system where I have a dataimporthandler defined for one table in my database. I need to also index data from another table, so therefore I will need another index to search on. Does this mean I must configure another solr instance (another schema.xml file, dataimporthandler sql file, etc)? Do I need another solr core for this? Thanks Greg -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/DataImportHandler-on-2-tables-tp2891256p2891256.html To start a new topic under Solr - User, email ml-node+472068-1765922688-383...@n3.nabble.com To unsubscribe from Solr - User, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472068code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-on-2-tables-tp2891256p2891272.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: DataImportHandler on 2 tables
No, the data has no relationship between each other, they are both independant with no joins. I want to search separately Greg -Original Message- From: lboutros [mailto:boutr...@gmail.com] Sent: 2 mai 2011 16:29 To: solr-user@lucene.apache.org Subject: Re: DataImportHandler on 2 tables Do you want to search on the datas from the tables together or seperately ? Is there a join between the two tables ? Ludovic. 2011/5/2 Greg Georges [via Lucene] ml-node+2891256-222073995-383...@n3.nabble.com Hello all, I have a system where I have a dataimporthandler defined for one table in my database. I need to also index data from another table, so therefore I will need another index to search on. Does this mean I must configure another solr instance (another schema.xml file, dataimporthandler sql file, etc)? Do I need another solr core for this? Thanks Greg -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/DataImportHandler-on-2-tables-tp2891256p2891256.html To start a new topic under Solr - User, email ml-node+472068-1765922688-383...@n3.nabble.com To unsubscribe from Solr - User, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472068code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-on-2-tables-tp2891256p2891272.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler on 2 tables
ok, so It seems you should create a new index and core as you said. see here for the management : http://wiki.apache.org/solr/CoreAdmin But it seems that is a problem for you. Is it ? Ludovic. 2011/5/2 Greg Georges [via Lucene] ml-node+2891277-472183207-383...@n3.nabble.com No, the data has no relationship between each other, they are both independant with no joins. I want to search separately Greg - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-on-2-tables-tp2891256p2891316.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: DataImportHandler on 2 tables
No it is not a problem, just wanted to confirm my question before looking into solr cores more closely. Thanks for your advice and confirmation Greg -Original Message- From: lboutros [mailto:boutr...@gmail.com] Sent: 2 mai 2011 16:43 To: solr-user@lucene.apache.org Subject: Re: DataImportHandler on 2 tables ok, so It seems you should create a new index and core as you said. see here for the management : http://wiki.apache.org/solr/CoreAdmin But it seems that is a problem for you. Is it ? Ludovic. 2011/5/2 Greg Georges [via Lucene] ml-node+2891277-472183207-383...@n3.nabble.com No, the data has no relationship between each other, they are both independant with no joins. I want to search separately Greg - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-on-2-tables-tp2891256p2891316.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fq parameter with partial value
So if you have a field that IS tokenized, regardless of what it's called, then when you send My Great Restaurant to it for _indexing_, it gets _tokenized upon indexing_ to seperate tokens: My, Great, Restaurant. Depending on what other analysis you have, it may get further analyzed, perhaps to: my, great, restaurant. You don't need to seperate into tokens yourself before sending it to Solr for indexing, if you define the field using a tokenizer, Solr will do that when you index. Because this is a VERY common thing to do with Solr; pretty much any field that you want to be effectively searchable you have Solr tokenize like this. Because Solr pretty much always matches on individual tokens, that's the fundamental way Solr works. Those seperate tokens is what allows you to SEARCH on the field, and get a match on my or on restaurant. If the field were non-tokenized, you'd ONLY get a hit if the user entered My Great Restaurant (and really not even then unless you take other actions, because of the way Solr query parsers work you'll have trouble getting ANY hits to a user-entered search with the 'lucene' or 'dismax' query parsers if you don't tokenize). That tokenized filed won't facet very well though -- if you facetted on a tokenized field with that example entered in it, you'll get a facet my with that item in it, and another facet great with that item in it, and another facet restuarant with that item in it. Which is why you likely want to use a seperate _untokenized_ field for facetting. Which is why you end up wanting/needing two seperate fields -- one that is tokenized for searching, and one that is not tokenized (and usually not analyzed at all) for facetting. Hope this helps. On 5/2/2011 2:43 AM, elisabeth benoit wrote: I'm a bit confused here. What is the difference between CATEGORY and CATEGORY_TOKENIZED if I just do a copyField from what field to another? And how can I search only for Restaurant (fq= CATEGORY_TOKENIZED: Restaurant). Shouldn't I have something like field name=CATEGORY_TOKENIZEDHotel/field, if I want this to work. And from what I understand, this means I should do more then just copy field name=*CATEGORY*Restaurant Hotel/field to CATEGORY_TOKENIZED. Thanks, Elisabeth 2011/4/28 Erick Ericksonerickerick...@gmail.com See below: On Thu, Apr 28, 2011 at 9:03 AM, elisabeth benoit elisaelisael...@gmail.com wrote: yes, the multivalued field is not broken up into tokens. so, if I understand well what you mean, I could have a field CATEGORY with multiValued=true a field CATEGORY_TOKENIZED with multiValued= true and then some POI field name=NAMEPOI_Name/field ... field name=*CATEGORY*Restaurant Hotel/field field name=CATEGORY_TOKENIZEDRestaurant/field field name=CATEGORY_TOKENIZEDHotel/field [EOE] If the above is the document you're sending, then no. The document would be indexed with field name=*CATEGORY*Restaurant Hotel/field field name=CATEGORY_TOKENIZEDRestaurant Hotel/field Or even just: field name=*CATEGORY*Restaurant Hotel/field and set up acopyField to copy the value from CATEGORY to CATEGORY_TOKENIZED. The multiValued part comes from: And a single POIs might have different categories so your document could have which would look like: field name=CATEGORYRestaruant Hotel/field field name=CATEGORYHealth Spa/field field name=CATEGORYDance Hall/field and your document would be counted for each of those entries while searches against CATEGORY_TOKENIZED would match things like dance spa etc. But do notice that if you did NOT want searching for restaurant hall (no quotes), to match then you could do proximity searches for less than your increment gap. e.g. (this time with the quotes) would be restaurant hall~50, which would then NOT match if your increment gap were 100. Best Erick do faceting on CATEGORY and fq on CATEGORY_TOKENIZED. But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED? Best regards Elisabeth 2011/4/28 Erick Ericksonerickerick...@gmail.com So, I assume your CATEGORY field is multiValued but each value is not broken up into tokens, right? If that's the case, would it work to have a second field CATEGORY_TOKENIZED and run your fq against that field instead? You could have this be a multiValued field with an increment gap if you wanted to prevent matches across separate entries and have your fq do a proximity search where the proximity was less than the increment gap Best Erick On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hi Stefan, Thanks for answering. In more details, my problem is the following. I'm working on searching points of interest (POIs), which can be hotels, restaurants, plumbers, psychologists, etc. Those POIs can be identified among other things by categories or by brand. And a single POIs might have different categories (no maximum number). User might enter a query like McDonald’s Paris or Restaurant Paris or
Indexing multiple languages
I have categories facets. Currently on in 1 language, but since my site is multilanguage, I need to index them in multiple languages. My table looks like this: [music_categories] id int Unchecked title nvarchar(50)Unchecked title_ennvarchar(50)Unchecked title_nlnvarchar(50)Unchecked In my data-config.xml I have this, only for 1 language: entity name=artist_category query=select categoryid from artist_categories where objectid=${artist.id} entity name=category query=select title from music_categories where id = '${artist_category.categoryid}' field name=categories column=title / /entity /entity Now, the only way I can imagine indexing multiple languages is by duplicating these lines: entity name=artist_category_en query=select categoryid from artist_categories where objectid=${artist.id} entity name=category_en query=select title_en from music_categories where id = '${artist_category.categoryid}' field name=categories_en column=title_en / /entity /entity entity name=artist_category_nl query=select categoryid from artist_categories where objectid=${artist.id} entity name=category_nl query=select title_nl from music_categories where id = '${artist_category.categoryid}' field name=categories_nl column=title_nl / /entity /entity Is there a better way, e.g. where I can do some sort of parametering like {lang] or something? -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-multiple-languages-tp2891546p2891546.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Formatted date/time in long field and javabinRW exception
: Any thoughts on this one? Why does Solr output a string in a long field with : XMLResponseWriter but fails doing so (as it should) with the javabin format? performance. the XML Response writer doesn't make any attempt to validate data from the index on the way out, the stored value in the index is a string, and the stream it's writing to accepts strings, so it writes the value as a string wrapped in long tags. The Binary response writer on the other hand streams strings differnetly then longs, so it does care when it encounters the inforrect type, and it errors. Bottom line: if your schema.xml is not consistent with your index, all bets are off as to what the behavior will be. -Hoss
Re: Formatted date/time in long field and javabinRW exception
: Any thoughts on this one? Why does Solr output a string in a long field : with XMLResponseWriter but fails doing so (as it should) with the : javabin format? performance. the XML Response writer doesn't make any attempt to validate data from the index on the way out, the stored value in the index is a string, and the stream it's writing to accepts strings, so it writes the value as a string wrapped in long tags. The Binary response writer on the other hand streams strings differnetly then longs, so it does care when it encounters the inforrect type, and it errors. Bottom line: if your schema.xml is not consistent with your index, all bets are off as to what the behavior will be. That sounds quite reasonable indeed. But i don't understand why Solr doesn't throw an exception when i actually index a string in a long fieldType while i do remember getting some number formatting exception when pushing strings to an integer fieldType. With the current set up i can send a properly formatted date to a long fieldType, which should, in my opionion, punish me with an exception. -Hoss
Re: Avoiding corrupted index
: First, I tried the scripts provided in the Solr distribution without success ... : And that's true : there is no /opt/apache-solr-1.4.1/src/bin/scripts-util : but a /opt/apache-solr-1.4.1/src/scripts/scripts-util : Is this normal to distribute the scripts with a bad path ? it looks like you are trying to run the scripts from the src directory of the distro ... they are ment to be installed in a bin directory in your solr home dir (so they can locate the default data dir, etc...) If you haven't seen them already... http://wiki.apache.org/solr/CollectionDistribution http://wiki.apache.org/solr/SolrCollectionDistributionScripts : Then I discovered that these utility scripts were not distributed anymore : with the version 3.1.0 : were they not reliable ? can we get corrupted : backups with this scripts ? no, as far as i know they work great. they were not included in the *binary* distributions of Solr, but they were most certianly included in the *source* distributions ... i think that was actually an oversight ... 3.1 is hte first time we had a binary distibution, and there's no reason i know of why they shouldn't have been in both. (in general, these scripts have fallen out of favor because they aren't as portable or as easy to test as the java based replication, so they are easy to forget) -Hoss
Re: DataImportHandler on 2 tables
Greg, 1 instance with 2 cores, each with their own schema, solrconfig, etc. (the conf dir stuff). Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Greg Georges greg.geor...@biztree.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Mon, May 2, 2011 4:23:43 PM Subject: DataImportHandler on 2 tables Hello all, I have a system where I have a dataimporthandler defined for one table in my database. I need to also index data from another table, so therefore I will need another index to search on. Does this mean I must configure another solr instance (another schema.xml file, dataimporthandler sql file, etc)? Do I need another solr core for this? Thanks Greg
Re: Dismax Minimum Match/Stopwords Bug
: However, is there an actual fix in the 3.1 eDisMax parser which solves : the problem for real? Cannot find a JIRA issue for it. edismax uses the same query structure as dismax, which means it's not possible to fix anything here ... it's how the query parsers work. each word from the query string is analyzed by each field in the qf, and the result is used as a query on the word in the field. The individual clauses for each word are aggregated into a DisjunctionMaxQuery, and the set of DisjunctionMaxQueries are then combined into a BooleanQuery (with the appropriate minNrShouldMatch set) if a word from the input produces no output from the analyzers of *any* of the of fields, then the resulting DisjunctionMaxQuery is empty and droped from the final BooleanQuery ... so if a word in the query string is stop word for *every* field in the qf, there is no clause. but if *any* field in the qf produces a term for it, then there is a DisjunctionMaxQuery for that word added to hte main BooleanQuery. As i've said many times: this isn't a bug, it's fundemental point of the parser and the structure of the query. The best solution for people who get bit by this (in my opinion) is not to give up on stop words -- if you want to use stop words, by all means use stop words. BUT! You must use them in all the fields of your qf ... evne fields where you think why in gods name would i need stopwords on this field, those terms will never exist in this field! ... you may know that, and it may be true, but it doesn't change the fact that people will be *querying* for stop words against those fields, and you want to ignore them when they do. -Hoss
Re: Indexing relations for sorting
: Every product-to-category relation has its own sorting order which we would : like to index in solr. ... : We want all products of subcat1 (no mather what the parent category is) : ordered by their sorting order : : We want all products of cat2_subcat1 ordered by their sorting order the best suggestion i can think of is to create a field per category and use it to index the sort order for that category -- you haven't said what type of cardinality you are dealing with in your categorization, so if it's relatively small this should work well ... if it's unbounded it will have some serious problems however. : Our solr version is 1.3.0 Please, Please, Please consider upgrading when you are working on this project of yours. there have been too many bug fixes and performance enhancements since then to even remotely dream of listing them all in this email (that's what the CHANGES.txt file is for) -Hoss
How to debug if termsComponent is used
Hi, I defined a searchHanlder just for the sake of autosuggest, using TermsComponent. searchComponent name=terms class=org.apache.solr.handler.component.TermsComponent /searchComponent requestHandler name=/terms class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=echoParamsexplicit/str /lst arr name=components strterms/str strdebug/str /arr This configuration might not even make sense, to configure terms and debug component together. Is debug component must be wired up with query component? I just need a requestHanlder where i can run termsComponent, and debug on it. How do I achieve that? Thanks, cy /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-debug-if-termsComponent-is-used-tp2891735p2891735.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: updates not reflected in solr admin
: Thanks - we are issuing a commit via SolrJ; I think that's the same : thing, right? Or are you saying really we need to do a separate commit : (via HTTP) to update the admin console's view? ... : Yes separate commit is needed. Huh? No ... that's not true at all. A commit using SolrJ is no differnet then a commit via HTTP ... especially since that's all SOlrJ is doing when you ask it to commit. -Hoss
Re: updates not reflected in solr admin
: I saw a comment recently (from Lance) that there is (annoying) HTTP caching : enabled by default in solrconfig.xml. Does this sound like something that : would be caused by that cache? If so, I'd probably want to disable it. Does the HTTP caching that tends to bite people in the ass is actually your *browser* caching the responses from solr based on the headers solr sets in the response http://wiki.apache.org/solr/SolrConfigXml#HTTP_Caching In most browsers a Shift-Reload tells it to ignore it's cache a force a new request. : that affect performance of queries run via SolrJ? Also: why isn't that cache : flushed by a commit? Seems weird... if you use the example configs that came with Solr 1.4.1, then solr would generate Last-Modified and ETag headers that *would* tell your browser that the results had chaged after commit. If you use the example configs that came with SOlr 3.1, then solr sets the headers in such a way that the browser shouldn't cache at all. -Hoss
Re: XS DateTime format
: negative-signed numeral that represents the year. Is it intentional that : Solr strips leading zeros for the first four digits? No, it's a really stupid bug, due to some really stupid date formatting i haven't had a chance to refactor out of existence... https://issues.apache.org/jira/browse/SOLR-1899 -Hoss
Re: updates not reflected in solr admin
On 5/2/2011 8:02 PM, Chris Hostetter wrote: Huh? No ... that's not true at all. A commit using SolrJ is no differnet then a commit via HTTP ... especially since that's all SOlrJ is doing when you ask it to commit. Unless you're using the 'embedded' solr server? Wonder if the OP is. Jonathan
Re: Has NRT been abandoned?
Thanks Andy! Everything should work as before. So faceting, function queries, query boosting should still work. For eg: q=name:efghij^2.2 name:abcd^3.2 returns all docs with name efghij and abcd but ranking documents named abcd above efghij Regards, - NN On 5/1/2011 7:15 PM, Andy wrote: Nagendra, This looks interesting. Does Solr-RA support: 1) facet 2) Boost query such as {!boost b=log(popularity)}foo Thanks Andy --- On Sun, 5/1/11, Nagendra Nagarajayyannagaraja...@transaxtions.com wrote: From: Nagendra Nagarajayyannagaraja...@transaxtions.com Subject: Re: Has NRT been abandoned? To: solr-user@lucene.apache.org Date: Sunday, May 1, 2011, 12:01 PM Hi Andy: I have a solution for NRT with Solr 1.4.1. The solution uses the RankingAlgorithm as the search library. The NRT functionality allows you to add documents without the IndexSearchers being closed or caches being cleared. A commit is not needed with the document update. Searches can run concurrently with document updates. No changes are needed except for enabling the NRT through solrconfig.xml. The performance is about 262 TPS (document adds) on a dual core intel system with 2GB heap with searches in parallel. The performance at the moment is limited by how fast IndexWriter.getReader() performs. I have a white paper that describes NRT in details, allows you to download the tweets, schema and solrconfig.xml files. You can access the white paper from here: http://solr-ra.tgels.com/papers/solr-ra_real_time_search.pdf You can download Solr with RankingAlgorithm (Solr-RA) from here: http://solr-ra.tgels.com I still have not yet integrated the NRT with Solr 3.1 (the new release) and plan to do so very soon. Please let me know if you need any more info. Regards, - Nagendra Nagarajayya http://solr-ra.tgels.com On 5/1/2011 8:28 AM, Andy wrote: Hi, I read on this mailing list previously that NRT was implemented in 4.0, it just wasn't ready for production yet. Then I looked at the wiki (http://wiki.apache.org/solr/NearRealtimeSearch). It listed 2 jira issues related to NRT: SOLR-1308 and SOLR-1278. Both issues have their resolutions set to Won't Fix recently. Does that mean NRT is no longer going to happen? What's the state of NRT in Solr? Thanks Andy
Re: updates not reflected in solr admin
No - this is all running against an external tomcat-based solr. I'm back to being mystified now. Maybe I'll see if I can isolate this a bit more. I'll post back if I do, although I'm beginning to wonder if we should just move to 3.1 and not worry about it. -Mike On 5/2/2011 8:39 PM, Jonathan Rochkind wrote: On 5/2/2011 8:02 PM, Chris Hostetter wrote: Huh? No ... that's not true at all. A commit using SolrJ is no differnet then a commit via HTTP ... especially since that's all SOlrJ is doing when you ask it to commit. Unless you're using the 'embedded' solr server? Wonder if the OP is. Jonathan
Re: How to debug if termsComponent is used
Hi, That looks about right, but I don't know without checking around if debug component really needs query component, or if it can work with just terms component. Have you tried it? Did it not work? You may save yourself a lot of work and get something better than terms component with http://sematext.com/products/autocomplete/index.html btw. Or if you are using Solr trunk, with Suggester. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: cyang2010 ysxsu...@hotmail.com To: solr-user@lucene.apache.org Sent: Mon, May 2, 2011 6:57:49 PM Subject: How to debug if termsComponent is used Hi, I defined a searchHanlder just for the sake of autosuggest, using TermsComponent. searchComponent name=terms class=org.apache.solr.handler.component.TermsComponent /searchComponent requestHandler name=/terms class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=echoParamsexplicit/str /lst arr name=components strterms/str strdebug/str /arr This configuration might not even make sense, to configure terms and debug component together. Is debug component must be wired up with query component? I just need a requestHanlder where i can run termsComponent, and debug on it. How do I achieve that? Thanks, cy /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-debug-if-termsComponent-is-used-tp2891735p2891735.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: updates not reflected in solr admin
Right I read those comments in the config, and it all sounds reasonable - presumably a new Searcher is opened when (or shortly after) we commit, from whatever source. That was my operating assumption, and the reason I was so confused when I saw different result in two different clients. I don't want to pursue this probable user error beyond eliminating the obvious for the moment. I'll post back if I get more info. Thanks again everyone. -Mike On 5/2/2011 8:09 PM, Chris Hostetter wrote: : I saw a comment recently (from Lance) that there is (annoying) HTTP caching : enabled by default in solrconfig.xml. Does this sound like something that : would be caused by that cache? If so, I'd probably want to disable it. Does the HTTP caching that tends to bite people in the ass is actually your *browser* caching the responses from solr based on the headers solr sets in the response http://wiki.apache.org/solr/SolrConfigXml#HTTP_Caching In most browsers a Shift-Reload tells it to ignore it's cache a force a new request. : that affect performance of queries run via SolrJ? Also: why isn't that cache : flushed by a commit? Seems weird... if you use the example configs that came with Solr 1.4.1, then solr would generate Last-Modified and ETag headers that *would* tell your browser that the results had chaged after commit. If you use the example configs that came with SOlr 3.1, then solr sets the headers in such a way that the browser shouldn't cache at all. -Hoss
How to take differential backup of Solr Index
Hi, Is there any way to take differential backup of Solr Index? Thanks, Gaurav
Re: How to take differential backup of Solr Index
The Replication feature does this. If you configure a query server as a 'backup' server, it downloads changes but does not read them. On Mon, May 2, 2011 at 9:56 PM, Gaurav Shingala gaurav.shing...@hotmail.com wrote: Hi, Is there any way to take differential backup of Solr Index? Thanks, Gaurav -- Lance Norskog goks...@gmail.com
Re: Has NRT been abandoned?
Everything should work as before. So faceting, function queries, query boosting should still work. For eg: q=name:efghij^2.2 name:abcd^3.2 returns all docs with name efghij and abcd but ranking documents named abcd above efghij Thanks Nagendra. But I wasn't talking about field boost. The kind of boosting I need: {!boost b=log(popularity)}foo requires BoostQParserPlugin (http://search-lucene.com/jd/solr/org/apache/solr/search/BoostQParserPlugin.html ) Does Solr-RA come with BoostQParserPlugin? Thanks.