Re: Multiple Process of the SAME solr instance
Shalin, I understand that :-) My problem is, if 1 solr instance process(save) 100 documents one-by-one, it would not be very effective, I want to create 10 clones (process/threads/cores) of the same solr instance, so that 10 documents get processed(saved to solr) simaltaneously. Thanks and regards, Mohit Ranka Shalin Shekhar Mangar wrote: On Thu, Sep 18, 2008 at 11:03 AM, mohitranka [EMAIL PROTECTED] wrote: Otis, I understand that 1 solr instance can store n documents (one-by-one). My query was how to create m such instances/processes/threads so that m documents get stored at a time, instead of 1 at a time. All the instances should read at the same port. You can send a batch of m documents at a time in the same XML. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Multiple-Process-of-the-SAME-solr-instance-tp19533951p19546626.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr vs Autonomy
Geoff, Perhaps you can find out the list of features/functionalities that your project requires and we can give you quick yes/no. Or perhaps you can get those others to list those Autonomy features that they think they really need, and we can tell you how Solr compares. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Geoff Hopson [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, September 18, 2008 1:46:33 AM Subject: Solr vs Autonomy Hi, I'm under pressure to justify the use of Solr on my project, and others are suggesting that Autonomy be used instead. Apart from price, does anyone have a list of pros/cons around Autonomy compared to Solr? Thanks geoff
Re: Setting request method to post on SolrQuery causes ClassCastException
A quick work-around is, I think, to tell Solr to use the non-binary response, e.g. wt=xml (I think that's the syntax). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: syoung [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, September 17, 2008 7:27:30 PM Subject: Setting request method to post on SolrQuery causes ClassCastException Hi, I need to have queries over a certain length done as a post instead of a get. However, when I set the method to post, I get a ClassCastException. Here is the code: public QueryResponse query(SolrQuery solrQuery) { QueryResponse response = null; try { if (solrQuery.toString().length() MAX_URL_LENGTH) response = server.query(solrQuery, SolrRequest.METHOD.POST); else response = server.query(solrQuery, SolrRequest.METHOD.GET); } catch (SolrServerException e) { throw new DataAccessResourceFailureException(e.getMessage(), e); } return response; } And the stack trace: java.lang.ClassCastException: java.lang.String org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89) org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:385) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183) org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:113) com.localmatters.guidespot.util.SolrTemplate.query(SolrTemplate.java:33) Thanks, Susan -- View this message in context: http://www.nabble.com/Setting-request-method-to-post-on-SolrQuery-causes-ClassCastException-tp19543232p19543232.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Process of the SAME solr instance
Mohit, I think you are thinking too hard - trying to optimize something that doesn't sound like it needs optimizing at this point in your project. I suggest you start with 1 Solr instance and then see if anything needs to be faster after you've pushed that to its limits. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: mohitranka [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, September 18, 2008 2:15:25 AM Subject: Re: Multiple Process of the SAME solr instance Shalin, I understand that :-) My problem is, if 1 solr instance process(save) 100 documents one-by-one, it would not be very effective, I want to create 10 clones (process/threads/cores) of the same solr instance, so that 10 documents get processed(saved to solr) simaltaneously. Thanks and regards, Mohit Ranka Shalin Shekhar Mangar wrote: On Thu, Sep 18, 2008 at 11:03 AM, mohitranka wrote: Otis, I understand that 1 solr instance can store n documents (one-by-one). My query was how to create m such instances/processes/threads so that m documents get stored at a time, instead of 1 at a time. All the instances should read at the same port. You can send a batch of m documents at a time in the same XML. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Multiple-Process-of-the-SAME-solr-instance-tp19533951p19546626.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field level security
Hi, I don't understand all the details, but I'll inline a few comments. - Original Message From: Geoff Hopson [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, September 18, 2008 1:44:33 AM Subject: Field level security Hi, First post/question, so please be gentle :-) I am trying to put together a security model around fields in my index. My requirement is that a user may not have permission to view certain fields in the index when he does a search. For example, he may have permission to see the name and address, but not the occupation. Whereas a different user with different permissions will be able to search all 3 fields. What exactly is restricted? Viewing of specific fields in results, or searching in specific fields? If it's the former, you could tell Solr which fields to return using %fl=field1,field2... If it's the latter, you could always write a custom SearchComponent that takes your custom userType or allowedFields parameter and constructs a query based on that. What is the best way to model this? My current stab at this has a document-level security level set (I have a field called security_default), and all fields have this default. If there are exceptions, I have a multiValued field called 'security_exceptions' where I comma delimit the fild name and different access permission for that field. Eg I might have 'occupation=Restricted' in that field. This falls over when I copyField fields into a text field for easier searching. Searching across multiple fields is pretty easy, too. I'd stick to that, as that also lets you assign different weight to different fields. Otis Has anyone else attempted to do this and are willing to share their ideas? Thanks in advance, Geoff
Re: Special character matching 'x' ?
On Thu, 18 Sep 2008 10:53:39 +0530 Sanjay Suri [EMAIL PROTECTED] wrote: One of my field values has the name R__ikk__nen which contains a special characters. Strangely, as I see it anyway, it matches on the search query 'x' ? Can someone explain or point me to the solution/documentation? hi Sanjay, Akshay should have given you an answer for this. In a more general way, if you want to know WHY something is matching the way it is, run the query with debugQuery=true . There are a few pages in the wiki which explain other debugging techniques. b _ {Beto|Norberto|Numard} Meijome Ask not what's inside your head, but what your head's inside of. J. J. Gibson I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Field level security
Hi Otis, Thanks for the response. I'll try and inline some clarity... 2008/9/18 Otis Gospodnetic [EMAIL PROTECTED]: I am trying to put together a security model around fields in my index. My requirement is that a user may not have permission to view certain fields in the index when he does a search. For example, he may have permission to see the name and address, but not the occupation. Whereas a different user with different permissions will be able to search all 3 fields. What exactly is restricted? Viewing of specific fields in results, or searching in specific fields? I am restricting the results - the user can search everything, but I was planning (as you mention) to apply a fieldList qualifier to the query. In my head (ie not tried it yet) I was hoping I could write a 'SecurityRequestHandler' that would take an incoming security 'token' and construct a %fl qualifier. Some other thoughts in my head are around developing my own fieldType, where I could tokenise the value against the field (e.g. store field name=occupationcandlestick maker=Restricted/field or something similar. Thoughts on that? If it's the former, you could tell Solr which fields to return using %fl=field1,field2... If it's the latter, you could always write a custom SearchComponent that takes your custom userType or allowedFields parameter and constructs a query based on that. What is the best way to model this? My current stab at this has a document-level security level set (I have a field called security_default), and all fields have this default. If there are exceptions, I have a multiValued field called 'security_exceptions' where I comma delimit the fild name and different access permission for that field. Eg I might have 'occupation=Restricted' in that field. This falls over when I copyField fields into a text field for easier searching. Searching across multiple fields is pretty easy, too. I'd stick to that, as that also lets you assign different weight to different fields. My requirement is to offer a google-type search, so the user can type in john smith ford green and get results where ford may be a last name or a car manufacturer, or green is the colour of the car, a last name or part of a town name. If I tokenised the field values as above and copyField-ed them into a single text box, would my tokeniser pick those out? Dunno - I guess I need to roll my sleeves up and do some coding, try some of this out. Thanks again for any insights Geoff
Re: Solr vs Autonomy
As per other thread 1) security down to field level Otherwise I am mostly happy that Solr gives me everything that Autonomy does. 2008/9/18 Otis Gospodnetic [EMAIL PROTECTED]: Geoff, Perhaps you can find out the list of features/functionalities that your project requires and we can give you quick yes/no. Or perhaps you can get those others to list those Autonomy features that they think they really need, and we can tell you how Solr compares. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Geoff Hopson [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, September 18, 2008 1:46:33 AM Subject: Solr vs Autonomy Hi, I'm under pressure to justify the use of Solr on my project, and others are suggesting that Autonomy be used instead. Apart from price, does anyone have a list of pros/cons around Autonomy compared to Solr? Thanks geoff -- Light travels faster than sound. This is why some people appear bright until you hear them speak……… Mario Kart Wii: 2320 6406 5974
AW: Date field mystery
Hi Chris, it was a long night for our solr server today because we rebuilt the complete index using well formed date string. And the date field is stored now so that we can see if there went something wrong :-) But our problems are solved completely. Now I can give you a very exact description what is the problem now (and what was the reason that we used malformed date values). Let's imagine we have 3 records with die following date values: 1. 2006-03-04T12:23:19Z 2. 2007-08-12T19:07:03Z 3. 2008-09-16T12:56:19Z And now I will give you some queries and which results we get back: - date:[2005-01-01T00:00:00Z TO NOW] or date:[2005-01-01T00:00:00Z TO 2008-09-18T09:45:00Z]: 1 and 2 (incorrect) - date:[2005-01-01T00:00:00Z TO 20080918T09:45:00Z]: 1, 2, 3 (correct) - date:[2005-01-01T00:00:00Z TO 2007-12-31T23:59:59Z]: only 1 (incorrect) - date:[2005-01-01T00:00:00Z TO 20071231T23:59:59Z]: 1 and 2 (correct) So as you can see using - in the second parameter of the range query for the date field causes an error and doesn't find the record should has to be found, using a malformed date value without - return the correct records. When using - for the second parameter all records that are from the year contained in the parameter aren't found any more. This behavior is reproducible on different systems, either CentOS or Debian. It must be a problem of solr or the Lucene (query parser) itself. Our next steps are to test our scenario using solr 1.3 and if the problem isn't fix we will using timestamps instead for the date format. But maybe this is a general problem of solr and should be fixed because in other cases and for other users it's not possible to make a workaround and they get wrong (incomplete) results for their query. Best regards, Christian
Re: cron job update index
Ok Thanks it's very clear. Just do you know why my cron job doesn't work : # m h dom mon dow command */5 * * * * /usr/bin/wget http://solr-test.books.com:8080/solr/books/dataimport?command=delta-import When I go to check the date in conf/dataimport.properties, the date and hour doesn't change? so yesterday : #Wed Sep 17 18:07:14 CEST 2008 last_index_time=2008-09-17 17\:24\:07 and weirdly if I run wget http://solr-test.books.com:8080/solr/books/dataimport?command=delta-import [EMAIL PROTECTED]:/# wget http://solr-test.adm.books.com:8180/solr/books/dataimport?command=delta-import --09:26:24-- http://solr-test.adm.books.com:8180/solr/books/dataimport?command=delta-import = `dataimport?command=delta-import.2' Resolving solr-test.adm.books.com... 10.97.1.151 Connecting to solr-test.adm.books.com|10.97.1.151|:8180... connected. HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers. Retrying. --09:27:51-- http://solr-test.adm.books.com:8180/solr/video/dataimport?command=delta-import (try: 2) = `dataimport?command=delta-import.2' Connecting to solr-test.adm.books.com|10.97.1.151|:8180... connected. HTTP request sent, awaiting response... 200 OK Length: 807 [text/xml] 100%[==] 807 --.--K/s 09:27:51 (174.91 MB/s) - `dataimport?command=delta-import.2' saved [807/807] And my dataimport.properties doesn't change either. But If i go through my browser it does change my dataimport.properties. Shalin Shekhar Mangar wrote: On Wed, Sep 17, 2008 at 9:42 PM, sunnyfr wrote: Sorry but silly question about Then the main query is executed for each primary key identified by the deltaQuery. This main query is used to create the documents and index them. I don't see in the code the link between the deltaQuery and and the main Query, how does it get back this ids which has been modified? After the delta query is executed and the changed IDs (PK) are collected. It modified the query using the pk attribute with a value in delta list and runs it to create the document. For example: entity pk=id query=select * from books deltaQuery=select id from books where modified '${dataimport.last_index_time}' Suppose changed pk given by deltaQuery is [1,5,7] then the following queries are executed: select * from books where id = '1'; select * from books where id = '5'; select * from books where id = '7'; ho just finished, it tooks actually a long time : so maybe my cron job has to be less often ? str name= Indexing completed. Added/Updated: 390796 documents. Deleted 0 documents. /str str name=Committed2008-09-17 18:07:47/str str name=Time taken 0:43:40.465/str Vary the cron job depending on how frequently and by how many documents the DB is updated. If an existing import is running, additional calls to start an import operation are ignored. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/cron-job-update-index-tp19520468p19547964.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Setting request method to post on SolrQuery causes ClassCastException
I guess the post is not sending the correct 'wt' parameter. try setting wt=javabin explicitly . wt=xml may not work because the parser still is binary. check this http://wiki.apache.org/solr/Solrj#xmlparser On Thu, Sep 18, 2008 at 11:49 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: A quick work-around is, I think, to tell Solr to use the non-binary response, e.g. wt=xml (I think that's the syntax). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: syoung [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, September 17, 2008 7:27:30 PM Subject: Setting request method to post on SolrQuery causes ClassCastException Hi, I need to have queries over a certain length done as a post instead of a get. However, when I set the method to post, I get a ClassCastException. Here is the code: public QueryResponse query(SolrQuery solrQuery) { QueryResponse response = null; try { if (solrQuery.toString().length() MAX_URL_LENGTH) response = server.query(solrQuery, SolrRequest.METHOD.POST); else response = server.query(solrQuery, SolrRequest.METHOD.GET); } catch (SolrServerException e) { throw new DataAccessResourceFailureException(e.getMessage(), e); } return response; } And the stack trace: java.lang.ClassCastException: java.lang.String org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89) org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:385) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183) org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:113) com.localmatters.guidespot.util.SolrTemplate.query(SolrTemplate.java:33) Thanks, Susan -- View this message in context: http://www.nabble.com/Setting-request-method-to-post-on-SolrQuery-causes-ClassCastException-tp19543232p19543232.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: recip(myfield,m,a,b)
I don't think it can works at the index time, because I when somebody look for a book I want to boost the search in relation with the user language ...so I dont think it can works, except if I didn't get it. Thanks for your answer, hossman wrote: : Is there a way to convert to integer to check if a = b ... like : recip(myfield,m,language,lang) : But I would like to boost(scoring) field which have the same user language : and book language ... : : But for that I need to know convert.int(language) There is an OrdFieldSource that can be used with single-valued string fields to get a numeric value for where they are in the order of all values for that field ... it is in fact what get's use by default when you include a string fieldname in the functionquery syntax. But off the top of my head i don't think there are any Functions provided by default that let you compare two ValueSources and return one number if they are equal and another number if they aren't. Frankly: the best way to approach a problem like this is to set a boolean field at index time if the other two fields are the same. -Hoss -- View this message in context: http://www.nabble.com/recip%28myfield%2Cm%2Ca%2Cb%29-tp19452492p19549282.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Special character matching 'x' ?
Thanks Akshay and Norberto, I am still trying to make it work. I know the solution is what you pointed me to but is just taking me some time to make it work. thanks, -Sanjay On Thu, Sep 18, 2008 at 12:34 PM, Norberto Meijome [EMAIL PROTECTED]wrote: On Thu, 18 Sep 2008 10:53:39 +0530 Sanjay Suri [EMAIL PROTECTED] wrote: One of my field values has the name R__ikk__nen which contains a special characters. Strangely, as I see it anyway, it matches on the search query 'x' ? Can someone explain or point me to the solution/documentation? hi Sanjay, Akshay should have given you an answer for this. In a more general way, if you want to know WHY something is matching the way it is, run the query with debugQuery=true . There are a few pages in the wiki which explain other debugging techniques. b _ {Beto|Norberto|Numard} Meijome Ask not what's inside your head, but what your head's inside of. J. J. Gibson I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. -- Sanjay Suri Videocrux Inc. http://videocrux.com +91 99102 66626
Unable to filter fq param on a dynamic field
Hi, I have a fairly simple solr setup with several predefined fields that are indexed and stored and also depending on the type of product I also add various dynamic fields of type string to a record, and I should mention that I am using the solr.DisMaxRequestHandler request handler called /IvolutionSearch in my example requests. My Schema is as follows: field name=CampaignCode type=string indexed=true stored=true required=true / field name=CategoryName type=string indexed=true stored=true required=true / field name=CategoryPath type=string indexed=true stored=true required=true / field name=CountryCode type=string indexed=true stored=true required=true / field name=Id type=string indexed=true stored=true required=true / field name=ManufacturerName type=string indexed=true stored=true required=true / field name=MPN type=textTight indexed=true stored=true required=true / field name=ProductName type=text indexed=true stored=true required=true / field name=Overview type=text indexed=true stored=true required=false / field name=Price type=float indexed=true stored=true required=true / field name=ProductCode type=textTight indexed=true stored=true required=true / field name=ReviewRating type=integer indexed=true stored=true required=false / field name=StockCode type=string indexed=true stored=true required=false / field name=TaxCode type=string indexed=false stored=true required=true / field name=ThumbnailURI type=string indexed=false stored=true required=false / field name=WebClassification type=textTight indexed=true stored=true required=true / dynamicField name=*-facet type=string indexed=true stored=true multiValued=false / Now I can query for any of the fixed field types Such as ProductName or ReviewRating and get the results I expect but when I try to run a filter query on the dynamic fields in the result, I always end up with no results being returned. So if I run the following query against my copy of solr 1.3 I get the results I am expecting http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100 - result name=response numFound=1216 start=0 - doc str name=CampaignCode$A/str str name=CategoryNameMono Laser Printers/str str name=CategoryPathPrinters|Mono Laser Printers/str str name=Connectivity-Technology-facetWired/str str name=CountryCodeUK/str str name=IdUK$AQ969719/str str name=MPN3500V_DN/str str name=Manufacturer-facetXerox/str str name=ManufacturerNameXerox/str str name=Output-Type-facetMonochrome/str str name=OverviewThe Xerox Phaser 3500 series printer provides an affordable solution to meet the increasing volume a/str float name=Price464.10/float str name=ProductCodeQ969719/str str name=ProductNameXEROX 3500DN MONO LASER/str int name=ReviewRating / str name=StockCodeE000/str str name=TaxCode2/str str name=Technology-facetLaser/str str name=ThumbnailURI26099.jpg/str str name=Type-facetWorkgroup printer/str str name=WebClassificationMLASERPRN/str date name=timestamp2008-09-17T17:10:44.37Z/date /doc - doc str name=CampaignCode$B/str str name=CategoryNameMono Laser Printers/str str name=CategoryPathPrinters|Mono Laser Printers/str str name=Connectivity-Technology-facetWired/str and so on for the 100 results no if I try to filter those results to just those that contain Output-Type-facet equaling Monochrome using : http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome or http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome; or http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome; or http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome; I just get zero results back even though I know that filed contains that value, please before I pull my hair out tell me what mistake I have made, why can I query using a static field and not a dynamic field any help even if its to say I have been stupid or to tell me to reread a section of the manual/Wiki because I did not get the point much appreciated. Thanks Barry Hmailto:[EMAIL PROTECTED] Misco is a division of Systemax Europe Ltd. Registered in Scotland Number 114143. Registered Office: Caledonian Exchange, 19a Canning Street, Edinburgh EH3 8EG. Telephone +44 (0)1933 686000.
Re: Some new SOLR features
Hi Yonik, One approach I have been working on that I will integrate into SOLR is the ability to use serialized objects for the analyzers so that the schema can be defined on the client side if need be. The analyzer classes will be dynamically loaded. Or there is no need for a schema and plain Java objects can be defined and used. I'd like to see the synonyms serialized as well. When I mentioned the serialization it is in regards to setting the configuration over the Hadoop RMI LUCENE-1336 protocol. Instead of creating methods for each new call one wants, the easiest approach in distributed computing is to have a dynamic class loaded that operates directly on SolrCore and so can do whatever is necessary to get the work completed. Creating new methods in distributed computing is always a bad idea IMO. In realtime indexing one will not be able to simply reindex all the time, and so either a dynamic schema, or no schema at all is best. Otherwise the documents would need to have a schemaVersion field, this gets messy I looked at this. Jason On Wed, Sep 17, 2008 at 5:10 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Wed, Sep 17, 2008 at 4:50 PM, Henrib [EMAIL PROTECTED] wrote: Yonik Seeley wrote: ...multi-core allows you to instantiate a completely new core and swap it for the old one, but it's a bit of a heavyweight approach ...a schema object would not be mutable, but that one could easily swap in a new schema object for an index at any time... Not sure I understand what we gain; if you change the schema, you'll most likely will have to reindex as well. That's management at a higher level in a way. There are enough ways that one could change the schema in a compatible way (say like just adding query-time synonyms, etc) that it does seem like we should permit it. Or are you saying we should have a shortcut for the whole operation of creating a new core, reindex content, replacing an existing core ? Eventually, it seems like we should be able to handle re-indexing when necessary. And we should consider the ability to change some config without necessarily reloading *everything*. -Yonik
Re: Some new SOLR features
This should be done. Great idea. On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED] wrote: My vote is for dynamically scanning a directory of configuration files. When a new one appears, or an existing file is touched, load it. When a configuration disappears, unload it. This model works very well for servlet containers. Lance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Wednesday, September 17, 2008 11:21 AM To: solr-user@lucene.apache.org Subject: Re: Some new SOLR features On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: If the configuration code is going to be rewritten then I would like to see the ability to dynamically update the configuration and schema without needing to reboot the server. Exactly. Actually, multi-core allows you to instantiate a completely new core and swap it for the old one, but it's a bit of a heavyweight approach. The key is finding the right granularity of change. My current thought is that a schema object would not be mutable, but that one could easily swap in a new schema object for an index at any time. That would allow a single request to see a stable view of the schema, while preventing having to make every aspect of the schema thread-safe. Also I would like the configuration classes to just contain data and not have so many methods that operate on the filesystem. That's the plan... completely separate the serialized and in memory representations. This way the configuration object can be serialized, and loaded by the server dynamically. It would be great for the schema to work the same way. Nothing will stop one from using java serialization for config persistence, however I am a fan of human readable for config files... so much easier to debug and support. Right now, people can cut-n-paste relevant parts of their config in email for support, or to a wiki to explain things, etc. Of course, if you are talking about being able to have custom filters or analyzers (new classes that don't even exist on the server yet), then it does start to get interesting. This intersects with deployment in general... and I'm not sure what the right answer is. What if Lucene or Solr needs an upgrade? It would be nice if that could also automatically be handled in a a large cluster... what are the options for handling that? Is there a role here for OSGi to play? It sounds like at least some of that is outside of the Solr domain. An alternative to serializing everything would be to ship a new schema along with a new jar file containing the custom components. -Yonik
Re: Some new SOLR features
That would allow a single request to see a stable view of the schema, while preventing having to make every aspect of the schema thread-safe. Yes that is the best approach. Nothing will stop one from using java serialization for config persistence, Persistence should not be serialized. Serialization is for transport over the wire for automated upgrades of the configuration. This could be done in XML as well, but it would be good to support both models. Is there a role here for OSGi to play? Yes. Eclipse successfully uses OSGI, and for grid computing in Java, and to take advantage of what Java can do with dynamic classloading, OSGI is the way to go. Every search project I have worked on needs this stuff to be way easier than it is now. The current distributed computing model in SOLR may work, but it will not work reliably and will break a lot. When it does break there is no way to know what happened. This will create excessive downtime for users. I have had excessive downtime in production even in the current simple master-slave architecture because there is no failover. Failover in the current system should be in there because it's too easy to implement with the rsync based batch replication. On Wed, Sep 17, 2008 at 2:21 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: If the configuration code is going to be rewritten then I would like to see the ability to dynamically update the configuration and schema without needing to reboot the server. Exactly. Actually, multi-core allows you to instantiate a completely new core and swap it for the old one, but it's a bit of a heavyweight approach. The key is finding the right granularity of change. My current thought is that a schema object would not be mutable, but that one could easily swap in a new schema object for an index at any time. That would allow a single request to see a stable view of the schema, while preventing having to make every aspect of the schema thread-safe. Also I would like the configuration classes to just contain data and not have so many methods that operate on the filesystem. That's the plan... completely separate the serialized and in memory representations. This way the configuration object can be serialized, and loaded by the server dynamically. It would be great for the schema to work the same way. Nothing will stop one from using java serialization for config persistence, however I am a fan of human readable for config files... so much easier to debug and support. Right now, people can cut-n-paste relevant parts of their config in email for support, or to a wiki to explain things, etc. Of course, if you are talking about being able to have custom filters or analyzers (new classes that don't even exist on the server yet), then it does start to get interesting. This intersects with deployment in general... and I'm not sure what the right answer is. What if Lucene or Solr needs an upgrade? It would be nice if that could also automatically be handled in a a large cluster... what are the options for handling that? Is there a role here for OSGi to play? It sounds like at least some of that is outside of the Solr domain. An alternative to serializing everything would be to ship a new schema along with a new jar file containing the custom components. -Yonik
Re: Some new SOLR features
Servlets is one thing. For SOLR the situation is different. There are always small changes people want to make, a new stop word, a small tweak to an analyzer. Rebooting the server for these should not be necessary. Ideally this is handled via a centralized console and deployed over the network (using RMI or XML) so that files do not need to be deployed. On Thu, Sep 18, 2008 at 7:41 AM, Mark Miller [EMAIL PROTECTED] wrote: Isnt this done in servlet containers for debugging type work? Maybe an option, but I disagree that this should drive anything in solr. It should really be turned off in production in servelet containers imo as well. This can really be such a pain in the ass on a live site...someone touches web.xml and the app server reboots*shudder*. Seen it, don't dig it. Jason Rutherglen wrote: This should be done. Great idea. On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED] wrote: My vote is for dynamically scanning a directory of configuration files. When a new one appears, or an existing file is touched, load it. When a configuration disappears, unload it. This model works very well for servlet containers. Lance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Wednesday, September 17, 2008 11:21 AM To: solr-user@lucene.apache.org Subject: Re: Some new SOLR features On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: If the configuration code is going to be rewritten then I would like to see the ability to dynamically update the configuration and schema without needing to reboot the server. Exactly. Actually, multi-core allows you to instantiate a completely new core and swap it for the old one, but it's a bit of a heavyweight approach. The key is finding the right granularity of change. My current thought is that a schema object would not be mutable, but that one could easily swap in a new schema object for an index at any time. That would allow a single request to see a stable view of the schema, while preventing having to make every aspect of the schema thread-safe. Also I would like the configuration classes to just contain data and not have so many methods that operate on the filesystem. That's the plan... completely separate the serialized and in memory representations. This way the configuration object can be serialized, and loaded by the server dynamically. It would be great for the schema to work the same way. Nothing will stop one from using java serialization for config persistence, however I am a fan of human readable for config files... so much easier to debug and support. Right now, people can cut-n-paste relevant parts of their config in email for support, or to a wiki to explain things, etc. Of course, if you are talking about being able to have custom filters or analyzers (new classes that don't even exist on the server yet), then it does start to get interesting. This intersects with deployment in general... and I'm not sure what the right answer is. What if Lucene or Solr needs an upgrade? It would be nice if that could also automatically be handled in a a large cluster... what are the options for handling that? Is there a role here for OSGi to play? It sounds like at least some of that is outside of the Solr domain. An alternative to serializing everything would be to ship a new schema along with a new jar file containing the custom components. -Yonik
Re: Some new SOLR features
Dynamic changes are not what I'm against...I'm against dynamic changes that are triggered by the app noticing that the config have changed. Jason Rutherglen wrote: Servlets is one thing. For SOLR the situation is different. There are always small changes people want to make, a new stop word, a small tweak to an analyzer. Rebooting the server for these should not be necessary. Ideally this is handled via a centralized console and deployed over the network (using RMI or XML) so that files do not need to be deployed. On Thu, Sep 18, 2008 at 7:41 AM, Mark Miller [EMAIL PROTECTED] wrote: Isnt this done in servlet containers for debugging type work? Maybe an option, but I disagree that this should drive anything in solr. It should really be turned off in production in servelet containers imo as well. This can really be such a pain in the ass on a live site...someone touches web.xml and the app server reboots*shudder*. Seen it, don't dig it. Jason Rutherglen wrote: This should be done. Great idea. On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED] wrote: My vote is for dynamically scanning a directory of configuration files. When a new one appears, or an existing file is touched, load it. When a configuration disappears, unload it. This model works very well for servlet containers. Lance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Wednesday, September 17, 2008 11:21 AM To: solr-user@lucene.apache.org Subject: Re: Some new SOLR features On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: If the configuration code is going to be rewritten then I would like to see the ability to dynamically update the configuration and schema without needing to reboot the server. Exactly. Actually, multi-core allows you to instantiate a completely new core and swap it for the old one, but it's a bit of a heavyweight approach. The key is finding the right granularity of change. My current thought is that a schema object would not be mutable, but that one could easily swap in a new schema object for an index at any time. That would allow a single request to see a stable view of the schema, while preventing having to make every aspect of the schema thread-safe. Also I would like the configuration classes to just contain data and not have so many methods that operate on the filesystem. That's the plan... completely separate the serialized and in memory representations. This way the configuration object can be serialized, and loaded by the server dynamically. It would be great for the schema to work the same way. Nothing will stop one from using java serialization for config persistence, however I am a fan of human readable for config files... so much easier to debug and support. Right now, people can cut-n-paste relevant parts of their config in email for support, or to a wiki to explain things, etc. Of course, if you are talking about being able to have custom filters or analyzers (new classes that don't even exist on the server yet), then it does start to get interesting. This intersects with deployment in general... and I'm not sure what the right answer is. What if Lucene or Solr needs an upgrade? It would be nice if that could also automatically be handled in a a large cluster... what are the options for handling that? Is there a role here for OSGi to play? It sounds like at least some of that is outside of the Solr domain. An alternative to serializing everything would be to ship a new schema along with a new jar file containing the custom components. -Yonik
delt-import looks stuck ???? how can I check if it's done or not ?
This XML file does not appear to have any style information associated with it. The document tree is shown below. − response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst − lst name=initArgs − lst name=defaults str name=configdata-config.xml/str /lst /lst str name=statusidle/str str name=importResponse/ − lst name=statusMessages str name=Time Elapsed4:26:16.934/str str name=Total Requests made to DataSource3451431/str str name=Total Rows Fetched9165885/str str name=Total Documents Processed493061/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 10:01:01/str str name=Identifying Delta2008-09-18 10:01:01/str str name=Deltas Obtained2008-09-18 10:01:43/str str name=Building documents2008-09-18 10:01:43/str str name=Total Changed Documents1587889/str /lst − str name=WARNING This response format is experimental. It is likely to change in the future. /str -- View this message in context: http://www.nabble.com/delt-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19551728.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Some new SOLR features
Yes, so it's probably best to make the changes through a remote interface so that the app will be able to make the appropriate internal changes. File based system changes are less than ideal, agreed, however I suppose with an open source project such as SOLR the kitchen sink affect happens and it will find it's way in there anyways. The hard part is organizing the project such that it does not get too bloated with everyone's features and allows features to be pluggable outside of the core releases. There are many things that may best best as contrib modules that could be OSGI based add ons rather than placed into the standard releases (of which I don't have any off hand). The standard for contribs for SOLR can be OSGI. This will greatly assist in SOLR becoming grid computing friendly. Ideally SOLR 2.0 would be cleaner, standardized, and most of the features pluggable. This will allow for consistent release cycles, make grid computing simpler to implement. SOLR seems like it could be going in the direction of bloat which could increasingly confuse new users. Instead they could either implement their own modules and upload them in the contrib section, implement their own that are proprietary. I am curious about what is the recommended place to put the query expansion code (such as adding boosting, adding phrase queries and such)? Is is now best to use a SearchComponent? Is it possible in the future to make SearchComponents OSGI enabled? On Thu, Sep 18, 2008 at 7:56 AM, Mark Miller [EMAIL PROTECTED] wrote: Dynamic changes are not what I'm against...I'm against dynamic changes that are triggered by the app noticing that the config have changed. Jason Rutherglen wrote: Servlets is one thing. For SOLR the situation is different. There are always small changes people want to make, a new stop word, a small tweak to an analyzer. Rebooting the server for these should not be necessary. Ideally this is handled via a centralized console and deployed over the network (using RMI or XML) so that files do not need to be deployed. On Thu, Sep 18, 2008 at 7:41 AM, Mark Miller [EMAIL PROTECTED] wrote: Isnt this done in servlet containers for debugging type work? Maybe an option, but I disagree that this should drive anything in solr. It should really be turned off in production in servelet containers imo as well. This can really be such a pain in the ass on a live site...someone touches web.xml and the app server reboots*shudder*. Seen it, don't dig it. Jason Rutherglen wrote: This should be done. Great idea. On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED] wrote: My vote is for dynamically scanning a directory of configuration files. When a new one appears, or an existing file is touched, load it. When a configuration disappears, unload it. This model works very well for servlet containers. Lance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Wednesday, September 17, 2008 11:21 AM To: solr-user@lucene.apache.org Subject: Re: Some new SOLR features On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: If the configuration code is going to be rewritten then I would like to see the ability to dynamically update the configuration and schema without needing to reboot the server. Exactly. Actually, multi-core allows you to instantiate a completely new core and swap it for the old one, but it's a bit of a heavyweight approach. The key is finding the right granularity of change. My current thought is that a schema object would not be mutable, but that one could easily swap in a new schema object for an index at any time. That would allow a single request to see a stable view of the schema, while preventing having to make every aspect of the schema thread-safe. Also I would like the configuration classes to just contain data and not have so many methods that operate on the filesystem. That's the plan... completely separate the serialized and in memory representations. This way the configuration object can be serialized, and loaded by the server dynamically. It would be great for the schema to work the same way. Nothing will stop one from using java serialization for config persistence, however I am a fan of human readable for config files... so much easier to debug and support. Right now, people can cut-n-paste relevant parts of their config in email for support, or to a wiki to explain things, etc. Of course, if you are talking about being able to have custom filters or analyzers (new classes that don't even exist on the server yet), then it does start to get interesting. This intersects with deployment in general... and I'm not sure what the right answer is. What if Lucene or Solr needs an upgrade? It would be nice if that could also automatically be handled in a a large cluster... what are the
Re: problem index accented character with release version of solr 1.3
From the XML 1.0 spec.: Legal characters are tab, carriage return, line feed, and the legal graphic characters of Unicode and ISO/IEC 10646. So, \005 is not a legal XML character. It appears the old StAX implementation was more lenient than it should have been and Woodstox is doing the correct thing. -Sean Ryan McKinley wrote: My guess is it has to do with switching the StAX implementation to geronimo API and the woodstox implementation https://issues.apache.org/jira/browse/SOLR-770 I'm not sure what the solution is though... On Sep 17, 2008, at 10:02 PM, Joshua Reedy wrote: I have been using a stable dev version of 1.3 for a few months. Today, I began testing the final release version, and I encountered a strange problem. The only thing that has changed in my setup is the solr code (I didn't make any config change or change the schema). a document has a text field with a value that contains: Andr\005é 3000 Indexing the document by itself or as part of a batch, produces the following error: Sep 17, 2008 5:00:27 PM org.apache.solr.common.SolrException log SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 5)) at [row,col {unknown-source}]: [5,205] at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668) at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:195) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:595) The latest version of the solr doesn't seem to like control characters (\005, in this case), but previous versions handled them (or at least ignored them). These characters shouldn't be in my documents, so there's a bug on my end to track down. However, I'm wondering if this was an expected change or an unintended consequence of recent work . . . -- - Be who you are and say what you feel, because those who mind don't matter and those who matter don't mind. -- Dr. Seuss
Re: delta-import looks stuck ???? how can I check if it's done or not ?
It was too long so I finally restart tomcat .. then 5mn later my cron job started : but it looks like nothing happening by cron job : This is my OUTPUT file : tot.txt ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime0/int/lstlst name=initArgslst name=defaultsstr name=configdata-config.xml/str/lst/lststr name=commanddelta-import,/strstr name=statusidle/strstr name=importResponse/lst name=statusMessages/str name=WARNINGThis response format is experimental. It is likely to change in the future./str /response This is my CRON JOB WGET */5 * * * * /usr/bin/wget -q --output-document=/home/tot.txt http://solr-test.adm.books.com:8180/solr/books/dataimport?command=delta-import, echo $(date) /home/tot.txt sunnyfr wrote: This XML file does not appear to have any style information associated with it. The document tree is shown below. − response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst − lst name=initArgs − lst name=defaults str name=configdata-config.xml/str /lst /lst str name=statusidle/str str name=importResponse/ − lst name=statusMessages str name=Time Elapsed4:26:16.934/str str name=Total Requests made to DataSource3451431/str str name=Total Rows Fetched9165885/str str name=Total Documents Processed493061/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 10:01:01/str str name=Identifying Delta2008-09-18 10:01:01/str str name=Deltas Obtained2008-09-18 10:01:43/str str name=Building documents2008-09-18 10:01:43/str str name=Total Changed Documents1587889/str /lst − str name=WARNING This response format is experimental. It is likely to change in the future. /str -- View this message in context: http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19554129.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr vs Autonomy
It depends entirely on the needs of the project. For some things, Solr is superior to Autonomy, for other things, not. I used to work at Autonomy (and Verity and Inktomi and Infoseek), and I chose Solr for Netflix. It is working great for us. wunder == Walter Underwood Former Ultraseek Architect Current Netflix Search Lead On 9/17/08 10:46 PM, Geoff Hopson [EMAIL PROTECTED] wrote: Hi, I'm under pressure to justify the use of Solr on my project, and others are suggesting that Autonomy be used instead. Apart from price, does anyone have a list of pros/cons around Autonomy compared to Solr? Thanks geoff
Re: Solr vs Autonomy
My project is looking to index 10s of millions of documents, providing search across a live-live environment (hence index distribution/replication is important). Most searches have to be done (ie to end user) in 5 seconds or less. The index has about 30 fields, and I reckon that the security access I alluded to can be solved with field-specific queries (as opposed to a single copyFielded text field). The searches are very simple, but need to be quick. The confidence in the information is important, and so scoring is value. Faceted searches have a place too. Autonomy seems to have a solid security/access control model but offers nothing above and beyond Solr, unless I am missing something. Dunno if that helps? Geoff 2008/9/18 Walter Underwood [EMAIL PROTECTED]: It depends entirely on the needs of the project. For some things, Solr is superior to Autonomy, for other things, not. I used to work at Autonomy (and Verity and Inktomi and Infoseek), and I chose Solr for Netflix. It is working great for us. wunder == Walter Underwood Former Ultraseek Architect Current Netflix Search Lead On 9/17/08 10:46 PM, Geoff Hopson [EMAIL PROTECTED] wrote: Hi, I'm under pressure to justify the use of Solr on my project, and others are suggesting that Autonomy be used instead. Apart from price, does anyone have a list of pros/cons around Autonomy compared to Solr? Thanks geoff -- Light travels faster than sound. This is why some people appear bright until you hear them speak……… Mario Kart Wii: 2320 6406 5974
Re: No server response code on insert: how do I avoid this at high speed?
Otis Gospodnetic wrote: Perhaps the container logs explain what happened? How about just throttling to the point where the failure rate is 0%? Too slow? Otis's questions regarding dropped inserts sent me back to the drawing board. The system had been tuned to a slower database to optimize speed and accept a few drops. When I migrated to a faster DB I didn't retune. Here are results of testing indexing performance for Tomcat and Jetty. The DB speedup apparently moved the bottleneck from getting records from the database (around 400 rps) to cramming records into the servlet container. System: 16 processor, 2.5 ghz, 64G memory Index: 33 Gig, freshly optimized, avg recordsize 1.4k Insert load: 250,000 records I calculate records/sec by dividing the number of successful inserts by the time. The adjusted time is the estimated time it would take to insert the full 250,000 records with no errors, which is raw time plus the additional time required to insert those dropped records, ie, raw time * (1 + error-rate * 0.01). Judging from processor/memory/io utilization, it appears the write speed of a single java thread is dominating the solr indexing speed. Which makes sense. Takehome lessons: The speed limit is about 450 records per second in our environment. Three or four threads posting inserts max out speed. More threads don't help. Jetty is significantly faster than Tomcat at sane thread counts in our environment I hope this is useful. -Jim PS: If you have formatting issues with this table, try viewing with a fixed width font Tomcat Jetty _ # threads Raw time # Drops% Error Records/sec Adj. time Raw time # Drops % Error Records/sec Adj. time 16533 171316.85 436.9 569.51 594 24222 9.69 380.1 651.55 15520 168786.75 448.31 555.1 518 28581 11.43 427.45577.22 14547 163786.55 427.1 582.83 496 30047 12.02 443.45555.61 13540 166386.65 432.15575.91 495 27076 10.83 450.35548.61 12545 159206.36 429.5 579.66 494 28785 11.51 447.8 550.88 11523 161926.47 447.05 556.84 484 26495 10.6 461.79535.29 10540 156436.26 433.99 573.8 497 27190 10.88 448.31551.05 9553 155436.21 423.97587.34 494 25862 10.34 453.72545.1 8541 140955.64 436.05571.51 501 23482 9.39 452.13548.06 7549 107354.29 435.82 572.55 499 24657 9.86 451.59548.22 6566 94683.79 424.97587.45 502 23074 9.23 452.04548.33 5588 77543.10 411.98606.23 527 20779 8.31 434.95570.8 4577 42011.68 425.99586.69 513 16608 6.64 454.96547.08 3613 0 0 407.83613537 9503 3.8 447.85557.41 2801 0 0 312.11801633 00 394.94633 1 1365 0 0 183.15 1365 1122 00 222.82 1122
Re: delta-import looks stuck ???? how can I check if it's done or not ?
Hit /dataimport again from a browser and refresh periodically to see the progress (number of documents indexed). On Thu, Sep 18, 2008 at 7:55 PM, sunnyfr [EMAIL PROTECTED] wrote: It was too long so I finally restart tomcat .. then 5mn later my cron job started : but it looks like nothing happening by cron job : This is my OUTPUT file : tot.txt ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime0/int/lstlst name=initArgslst name=defaultsstr name=configdata-config.xml/str/lst/lststr name=commanddelta-import,/strstr name=statusidle/strstr name=importResponse/lst name=statusMessages/str name=WARNINGThis response format is experimental. It is likely to change in the future./str /response This is my CRON JOB WGET */5 * * * * /usr/bin/wget -q --output-document=/home/tot.txt http://solr-test.adm.books.com:8180/solr/books/dataimport?command=delta-import , echo $(date) /home/tot.txt sunnyfr wrote: This XML file does not appear to have any style information associated with it. The document tree is shown below. − response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst − lst name=initArgs − lst name=defaults str name=configdata-config.xml/str /lst /lst str name=statusidle/str str name=importResponse/ − lst name=statusMessages str name=Time Elapsed4:26:16.934/str str name=Total Requests made to DataSource3451431/str str name=Total Rows Fetched9165885/str str name=Total Documents Processed493061/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 10:01:01/str str name=Identifying Delta2008-09-18 10:01:01/str str name=Deltas Obtained2008-09-18 10:01:43/str str name=Building documents2008-09-18 10:01:43/str str name=Total Changed Documents1587889/str /lst − str name=WARNING This response format is experimental. It is likely to change in the future. /str -- View this message in context: http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19554129.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: delta-import looks stuck ???? how can I check if it's done or not ?
It is exactly what I've done but it can't works like that ... - what would that mean ... cron job can't hit it properly ? - I've browse to /dataimport but it was like nothing was running so I finally went back to /dataimport?command=delta-import and then to /dataimport and I refresh it often ...indeed it works this way but it's not would suit me ... and it take ages ... now I'm : str name=statusbusy/str str name=importResponseA command is still running.../str − lst name=statusMessages str name=Time Elapsed0:18:44.54/str str name=Total Requests made to DataSource1855793/str str name=Total Rows Fetched5588946/str str name=Total Documents Processed265113/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 16:29:38/str str name=Identifying Delta2008-09-18 16:29:38/str str name=Deltas Obtained2008-09-18 16:30:26/str str name=Building documents2008-09-18 16:30:26/str str name=Total Changed Documents1603970/str /lst Shalin Shekhar Mangar wrote: Hit /dataimport again from a browser and refresh periodically to see the progress (number of documents indexed). On Thu, Sep 18, 2008 at 7:55 PM, sunnyfr [EMAIL PROTECTED] wrote: It was too long so I finally restart tomcat .. then 5mn later my cron job started : but it looks like nothing happening by cron job : This is my OUTPUT file : tot.txt ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime0/int/lstlst name=initArgslst name=defaultsstr name=configdata-config.xml/str/lst/lststr name=commanddelta-import,/strstr name=statusidle/strstr name=importResponse/lst name=statusMessages/str name=WARNINGThis response format is experimental. It is likely to change in the future./str /response This is my CRON JOB WGET */5 * * * * /usr/bin/wget -q --output-document=/home/tot.txt http://solr-test.adm.books.com:8180/solr/books/dataimport?command=delta-import , echo $(date) /home/tot.txt sunnyfr wrote: This XML file does not appear to have any style information associated with it. The document tree is shown below. − response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst − lst name=initArgs − lst name=defaults str name=configdata-config.xml/str /lst /lst str name=statusidle/str str name=importResponse/ − lst name=statusMessages str name=Time Elapsed4:26:16.934/str str name=Total Requests made to DataSource3451431/str str name=Total Rows Fetched9165885/str str name=Total Documents Processed493061/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 10:01:01/str str name=Identifying Delta2008-09-18 10:01:01/str str name=Deltas Obtained2008-09-18 10:01:43/str str name=Building documents2008-09-18 10:01:43/str str name=Total Changed Documents1587889/str /lst − str name=WARNING This response format is experimental. It is likely to change in the future. /str -- View this message in context: http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19554129.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19554770.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: delta-import looks stuck ???? how can I check if it's done or not ?
It is exactly what I've done but it can't works like that ... - what would that mean ... cron job can't hit it properly ? - I've browse to /dataimport but it was like nothing was running so I finally went back to /dataimport?command=delta-import and then to /dataimport and I refresh it often ...indeed it works this way but it's not would suit me ... and it take ages ... now I'm : str name=statusbusy/str str name=importResponseA command is still running.../str − lst name=statusMessages str name=Time Elapsed0:18:44.54/str str name=Total Requests made to DataSource1855793/str str name=Total Rows Fetched5588946/str str name=Total Documents Processed265113/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 16:29:38/str str name=Identifying Delta2008-09-18 16:29:38/str str name=Deltas Obtained2008-09-18 16:30:26/str str name=Building documents2008-09-18 16:30:26/str str name=Total Changed Documents1603970/str /lst Shalin Shekhar Mangar wrote: Hit /dataimport again from a browser and refresh periodically to see the progress (number of documents indexed). On Thu, Sep 18, 2008 at 7:55 PM, sunnyfr [EMAIL PROTECTED] wrote: It was too long so I finally restart tomcat .. then 5mn later my cron job started : but it looks like nothing happening by cron job : This is my OUTPUT file : tot.txt ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime0/int/lstlst name=initArgslst name=defaultsstr name=configdata-config.xml/str/lst/lststr name=commanddelta-import,/strstr name=statusidle/strstr name=importResponse/lst name=statusMessages/str name=WARNINGThis response format is experimental. It is likely to change in the future./str /response This is my CRON JOB WGET */5 * * * * /usr/bin/wget -q --output-document=/home/tot.txt http://solr-test.adm.books.com:8180/solr/books/dataimport?command=delta-import , echo $(date) /home/tot.txt sunnyfr wrote: This XML file does not appear to have any style information associated with it. The document tree is shown below. − response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst − lst name=initArgs − lst name=defaults str name=configdata-config.xml/str /lst /lst str name=statusidle/str str name=importResponse/ − lst name=statusMessages str name=Time Elapsed4:26:16.934/str str name=Total Requests made to DataSource3451431/str str name=Total Rows Fetched9165885/str str name=Total Documents Processed493061/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 10:01:01/str str name=Identifying Delta2008-09-18 10:01:01/str str name=Deltas Obtained2008-09-18 10:01:43/str str name=Building documents2008-09-18 10:01:43/str str name=Total Changed Documents1587889/str /lst − str name=WARNING This response format is experimental. It is likely to change in the future. /str -- View this message in context: http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19554129.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19554788.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr vs Autonomy
On Sep 18, 2008, at 3:23 AM, Geoff Hopson wrote: As per other thread 1) security down to field level how complex of a security model do you need? Is each users field visibility totally distinct? are there a few basic groups? If you are willing to write (or hire someone to write) a custom SearchComponent, you can remove fields from a response for a given users. ryan
Re: delta-import looks stuck ???? how can I check if it's done or not ?
Well it shows the number of documents that have changed, you can't expect 1603970 documents to be indexed instantly. On Thu, Sep 18, 2008 at 8:24 PM, sunnyfr [EMAIL PROTECTED] wrote: It is exactly what I've done but it can't works like that ... - what would that mean ... cron job can't hit it properly ? - I've browse to /dataimport but it was like nothing was running so I finally went back to /dataimport?command=delta-import and then to /dataimport and I refresh it often ...indeed it works this way but it's not would suit me ... and it take ages ... now I'm : str name=statusbusy/str str name=importResponseA command is still running.../str − lst name=statusMessages str name=Time Elapsed0:18:44.54/str str name=Total Requests made to DataSource1855793/str str name=Total Rows Fetched5588946/str str name=Total Documents Processed265113/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 16:29:38/str str name=Identifying Delta2008-09-18 16:29:38/str str name=Deltas Obtained2008-09-18 16:30:26/str str name=Building documents2008-09-18 16:30:26/str str name=Total Changed Documents1603970/str /lst -- Regards, Shalin Shekhar Mangar.
Re: delta-import looks stuck ???? how can I check if it's done or not ?
I agree about that but the last time 4hours later the number wasn't different : and if I check now, nothing changed : does it have to go across all the data like full import, I thought it would bring back just ids which need to be modify ...? lst name=statusMessages str name=Time Elapsed0:39:36.943/str str name=Total Requests made to DataSource3447914/str str name=Total Rows Fetched9054602/str str name=Total Documents Processed492558/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 16:29:38/str str name=Identifying Delta2008-09-18 16:29:38/str str name=Deltas Obtained2008-09-18 16:30:26/str str name=Building documents2008-09-18 16:30:26/str str name=Total Changed Documents1603970/str /lst look it was this morning : I just stopped it beacause it was too long ... It doesn't look logic: lst name=statusMessages str name=Time Elapsed6:9:0.256/str str name=Total Requests made to DataSource3451431/str str name=Total Rows Fetched9165885/str str name=Total Documents Processed493061/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 10:01:01/str str name=Identifying Delta2008-09-18 10:01:01/str str name=Deltas Obtained2008-09-18 10:01:43/str str name=Building documents2008-09-18 10:01:43/str str name=Total Changed Documents1587889/str /lst And do you think my cron job can't work ? Shalin Shekhar Mangar wrote: Well it shows the number of documents that have changed, you can't expect 1603970 documents to be indexed instantly. On Thu, Sep 18, 2008 at 8:24 PM, sunnyfr [EMAIL PROTECTED] wrote: It is exactly what I've done but it can't works like that ... - what would that mean ... cron job can't hit it properly ? - I've browse to /dataimport but it was like nothing was running so I finally went back to /dataimport?command=delta-import and then to /dataimport and I refresh it often ...indeed it works this way but it's not would suit me ... and it take ages ... now I'm : str name=statusbusy/str str name=importResponseA command is still running.../str − lst name=statusMessages str name=Time Elapsed0:18:44.54/str str name=Total Requests made to DataSource1855793/str str name=Total Rows Fetched5588946/str str name=Total Documents Processed265113/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 16:29:38/str str name=Identifying Delta2008-09-18 16:29:38/str str name=Deltas Obtained2008-09-18 16:30:26/str str name=Building documents2008-09-18 16:30:26/str str name=Total Changed Documents1603970/str /lst -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19555223.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: delta-import looks stuck ???? how can I check if it's done or not ?
On Thu, Sep 18, 2008 at 8:45 PM, sunnyfr [EMAIL PROTECTED] wrote: I agree about that but the last time 4hours later the number wasn't different : Do you mean that the number doesn't change at all on refreshing the page? Can you check the solr log file for exceptions? I suspect that you may be running out of memory. and if I check now, nothing changed : does it have to go across all the data like full import, I thought it would bring back just ids which need to be modify ...? No, it will bring back only those id if the deltaQuery is correct. Are you sure you modified less number of rows in the DB? lst name=statusMessages str name=Time Elapsed0:39:36.943/str str name=Total Requests made to DataSource3447914/str str name=Total Rows Fetched9054602/str str name=Total Documents Processed492558/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 16:29:38/str str name=Identifying Delta2008-09-18 16:29:38/str str name=Deltas Obtained2008-09-18 16:30:26/str str name=Building documents2008-09-18 16:30:26/str str name=Total Changed Documents1603970/str /lst look it was this morning : I just stopped it beacause it was too long ... It doesn't look logic: lst name=statusMessages str name=Time Elapsed6:9:0.256/str str name=Total Requests made to DataSource3451431/str str name=Total Rows Fetched9165885/str str name=Total Documents Processed493061/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 10:01:01/str str name=Identifying Delta2008-09-18 10:01:01/str str name=Deltas Obtained2008-09-18 10:01:43/str str name=Building documents2008-09-18 10:01:43/str str name=Total Changed Documents1587889/str /lst And do you think my cron job can't work ? Your wget command looks fine to me. -- View this message in context: http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19555223.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Solr vs Autonomy
I would do the field visibility one layer up from the search engine. That layer already knows about the user and can request the appropriate fields. Or request them all (better HTTP caching) and only show the appropriate ones. As I understand your application, putting access control in Solr doesn't make search faster or more accurate. Add a filter query to requests to restrict to the allowed documents, and you are good. I wouldn't worry too much about putting all the text in one field for speed. I tried that and it does help, but it means that you must rebuild the index when you need to change the mapping. I'm keeping things in separate fields and searching them all at query time (with boosts). wunder On 9/18/08 8:04 AM, Ryan McKinley [EMAIL PROTECTED] wrote: On Sep 18, 2008, at 3:23 AM, Geoff Hopson wrote: As per other thread 1) security down to field level how complex of a security model do you need? Is each users field visibility totally distinct? are there a few basic groups? If you are willing to write (or hire someone to write) a custom SearchComponent, you can remove fields from a response for a given users. ryan
Re: delta-import looks stuck ???? how can I check if it's done or not ?
this is my log file : [EMAIL PROTECTED]:/home# tail -f /var/log/tomcat5.5/catalina.$(date +%Y-%m-%d).log Sep 18, 2008 5:25:02 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity books with URL: jdbc:mysql://master-spare.vip.books.com/books Sep 18, 2008 5:25:02 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 50 Sep 18, 2008 5:25:53 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: books rows obtained : 1608415 Sep 18, 2008 5:25:53 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running DeletedRowKey() for Entity: books Sep 18, 2008 5:25:53 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: books rows obtained : 0 I just updated browser /dataimport, looks like another one has been started so maybe the cron started it : str name=statusbusy/str str name=importResponseA command is still running.../str − lst name=statusMessages str name=Time Elapsed0:0:44.993/str str name=Total Requests made to DataSource1/str str name=Total Rows Fetched1503980/str str name=Total Documents Processed0/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 17:30:01/str str name=Identifying Delta2008-09-18 17:30:01/str /lst Shalin Shekhar Mangar wrote: On Thu, Sep 18, 2008 at 8:45 PM, sunnyfr [EMAIL PROTECTED] wrote: I agree about that but the last time 4hours later the number wasn't different : Do you mean that the number doesn't change at all on refreshing the page? Can you check the solr log file for exceptions? I suspect that you may be running out of memory. and if I check now, nothing changed : does it have to go across all the data like full import, I thought it would bring back just ids which need to be modify ...? No, it will bring back only those id if the deltaQuery is correct. Are you sure you modified less number of rows in the DB? lst name=statusMessages str name=Time Elapsed0:39:36.943/str str name=Total Requests made to DataSource3447914/str str name=Total Rows Fetched9054602/str str name=Total Documents Processed492558/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 16:29:38/str str name=Identifying Delta2008-09-18 16:29:38/str str name=Deltas Obtained2008-09-18 16:30:26/str str name=Building documents2008-09-18 16:30:26/str str name=Total Changed Documents1603970/str /lst look it was this morning : I just stopped it beacause it was too long ... It doesn't look logic: lst name=statusMessages str name=Time Elapsed6:9:0.256/str str name=Total Requests made to DataSource3451431/str str name=Total Rows Fetched9165885/str str name=Total Documents Processed493061/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-09-18 10:01:01/str str name=Identifying Delta2008-09-18 10:01:01/str str name=Deltas Obtained2008-09-18 10:01:43/str str name=Building documents2008-09-18 10:01:43/str str name=Total Changed Documents1587889/str /lst And do you think my cron job can't work ? Your wget command looks fine to me. -- View this message in context: http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19555223.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p1948.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Hardware config for SOLR
I can't speak to a lot of this - but regarding the servers I'd go with the more powerful ones, if only for the amount of ram. Your index will likely be larger than 1 gig, and with only two you'll have a lot of your index not stored in ram, which will slow down your QPS. Thanks for your time! Matthew Runo Software Engineer, Zappos.com [EMAIL PROTECTED] - 702-943-7833 On Sep 17, 2008, at 3:32 PM, Andrey Shulinskiy wrote: Hello, We're planning to use SOLR for our project, got some questions. So I asked some Qs yesterday, got no answers whatsoever. Wondering if they didn't make sense, or if the e-mail was too long... :-) Anyway, I'll try to ask them again and hope for some answers this time. It's a very new experience for us so any help is really appreciated. First, some numbers we're expecting. - The average size of a doc: ~100K - The number of indexes: 1 - The query response time we're looking for: 200 - 300ms - The number of stored docs: 1st year: 500K - 1M 2nd year: 2-3M - The estimated number of concurrent users per second 1st year: 15 - 25 2nd year: 40 - 60 - The estimated number of queries 1st year: 15 - 25 2nd year: 40 - 60 Now the questions 1) Should we do sharding or not? If we start without sharding, how hard will it be to enable it? Is it just some config changes + the index rebuild or is it more? My personal opinion is to go without sharding at first and enable it later if do get a lot of documents. 2) How should we organize our clusters to ensure redundancy? Should we have 2 or more identical Masters (means that all the updates/optimisations/etc. are done for every one of them)? An alternative, afaik, is to reconfigure one slave to become the new Master, how hard is that? 3) Basically, we can get servers of two kinds: * Single Processor, Dual Core Opteron 2214HE * 2 GB DDR2 SDRAM * 1 x 250 GB (7200 RPM) SATA Drive(s) * Dual Processor, Quad Core 5335 * 16 GB Memory (Fully Buffered) * 2 x 73 GB (10k RPM) 2.5 SAS Drive(s), RAID 1 The second - more powerful - one is more expensive, of course. How can we take advantage of the multiprocessor/multicore servers? Is there some special setup required to make, say, 2 instances of SOLR run on the same server using different processors/cores? 4) Does it make much difference to get a more powerful Master? Or, on the contrary, as slaves will be queried more often, they should be the better ones? Maybe just the HDDs for the slaves should be as fast as possible? 5) How many slaves does it make sense to have per one Master? What's (roughly) the performance gain from 1 to 2, 2 - 3, etc? When does it stop making sense to add more slaves? As far as I understand, it depends mainly on the size of the index. However, I'd guess the time required to do a push for too many slaves can be a problem too, correct? Thanks, Andrey.
RE: Solr vs Autonomy
Hi Geoff, I cannot vouch for Autonomy however, earlier this year we did evaluate Endeca Solr and we went with Solr some of the reasons were: 1. Freedom of open source with Solr 2. Very good active solr open source community 3. Features pretty much overlap with both solr Endeca 4. Endeca however provides a very rich Business Tool that some people might like 5. Our development is comfortable working with open source 6. Not good support from Endeca on Internationalization Hope this helps in some ways -Raghu -Original Message- From: Geoff Hopson [mailto:[EMAIL PROTECTED] Sent: Thursday, September 18, 2008 12:47 AM To: solr-user@lucene.apache.org Subject: Solr vs Autonomy Hi, I'm under pressure to justify the use of Solr on my project, and others are suggesting that Autonomy be used instead. Apart from price, does anyone have a list of pros/cons around Autonomy compared to Solr? Thanks geoff
Re: Unable to filter fq param on a dynamic field
Barry, does this return the correct hits: http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=Output-Type-facet:Monochrome Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Barry Harding [EMAIL PROTECTED] To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thursday, September 18, 2008 7:21:49 AM Subject: Unable to filter fq param on a dynamic field Hi, I have a fairly simple solr setup with several predefined fields that are indexed and stored and also depending on the type of product I also add various dynamic fields of type string to a record, and I should mention that I am using the solr.DisMaxRequestHandler request handler called /IvolutionSearch in my example requests. My Schema is as follows: required=true / required=true / required=true / required=true / required=true / / required=true / / required=true / required=false / required=false / required=true / required=false / required=true / multiValued=false / Now I can query for any of the fixed field types Such as ProductName or ReviewRating and get the results I expect but when I try to run a filter query on the dynamic fields in the result, I always end up with no results being returned. So if I run the following query against my copy of solr 1.3 I get the results I am expecting http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100 - - $A Mono Laser Printers Printers|Mono Laser Printers Wired UK UK$AQ969719 3500V_DN Xerox Xerox Monochrome The Xerox Phaser 3500 series printer provides an affordable solution to meet the increasing volume a 464.10 Q969719 XEROX 3500DN MONO LASER E000 2 Laser 26099.jpg Workgroup printer MLASERPRN 2008-09-17T17:10:44.37Z - $B Mono Laser Printers Printers|Mono Laser Printers Wired and so on for the 100 results no if I try to filter those results to just those that contain Output-Type-facet equaling Monochrome using : http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome or http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome; or http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome; or http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome; I just get zero results back even though I know that filed contains that value, please before I pull my hair out tell me what mistake I have made, why can I query using a static field and not a dynamic field any help even if its to say I have been stupid or to tell me to reread a section of the manual/Wiki because I did not get the point much appreciated. Thanks Barry H Misco is a division of Systemax Europe Ltd. Registered in Scotland Number 114143. Registered Office: Caledonian Exchange, 19a Canning Street, Edinburgh EH3 8EG. Telephone +44 (0)1933 686000.
Re: AW: Date field mystery
Hi Christian, While I can't tell you whether the problem with - will be solved when you try it on 1.3, I can tell you that you should probably trim your dates so they are not as fine as you currently have them, unless you need such precision. We need to add this to the FAQ. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Kolodziej Christian [EMAIL PROTECTED] To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thursday, September 18, 2008 3:56:55 AM Subject: AW: Date field mystery Hi Chris, it was a long night for our solr server today because we rebuilt the complete index using well formed date string. And the date field is stored now so that we can see if there went something wrong :-) But our problems are solved completely. Now I can give you a very exact description what is the problem now (and what was the reason that we used malformed date values). Let's imagine we have 3 records with die following date values: 1. 2006-03-04T12:23:19Z 2. 2007-08-12T19:07:03Z 3. 2008-09-16T12:56:19Z And now I will give you some queries and which results we get back: - date:[2005-01-01T00:00:00Z TO NOW] or date:[2005-01-01T00:00:00Z TO 2008-09-18T09:45:00Z]: 1 and 2 (incorrect) - date:[2005-01-01T00:00:00Z TO 20080918T09:45:00Z]: 1, 2, 3 (correct) - date:[2005-01-01T00:00:00Z TO 2007-12-31T23:59:59Z]: only 1 (incorrect) - date:[2005-01-01T00:00:00Z TO 20071231T23:59:59Z]: 1 and 2 (correct) So as you can see using - in the second parameter of the range query for the date field causes an error and doesn't find the record should has to be found, using a malformed date value without - return the correct records. When using - for the second parameter all records that are from the year contained in the parameter aren't found any more. This behavior is reproducible on different systems, either CentOS or Debian. It must be a problem of solr or the Lucene (query parser) itself. Our next steps are to test our scenario using solr 1.3 and if the problem isn't fix we will using timestamps instead for the date format. But maybe this is a general problem of solr and should be fixed because in other cases and for other users it's not possible to make a workaround and they get wrong (incomplete) results for their query. Best regards, Christian
Re: Field level security
Hi, If all you have to do is hide certain fields from search results for some users, then your application -- the application that sends search requests to Solr can just use different fl=XXX parameters based on user's permission. I think that's all you need and the custom fieldType should not be needed. As for entering just the keywords and searching several fields automatically - this is what DisMax handler is good at, so give that a try. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Geoff Hopson [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, September 18, 2008 3:21:01 AM Subject: Re: Field level security Hi Otis, Thanks for the response. I'll try and inline some clarity... 2008/9/18 Otis Gospodnetic : I am trying to put together a security model around fields in my index. My requirement is that a user may not have permission to view certain fields in the index when he does a search. For example, he may have permission to see the name and address, but not the occupation. Whereas a different user with different permissions will be able to search all 3 fields. What exactly is restricted? Viewing of specific fields in results, or searching in specific fields? I am restricting the results - the user can search everything, but I was planning (as you mention) to apply a fieldList qualifier to the query. In my head (ie not tried it yet) I was hoping I could write a 'SecurityRequestHandler' that would take an incoming security 'token' and construct a %fl qualifier. Some other thoughts in my head are around developing my own fieldType, where I could tokenise the value against the field (e.g. store name=occupationcandlestick maker=Restricted or something similar. Thoughts on that? If it's the former, you could tell Solr which fields to return using %fl=field1,field2... If it's the latter, you could always write a custom SearchComponent that takes your custom userType or allowedFields parameter and constructs a query based on that. What is the best way to model this? My current stab at this has a document-level security level set (I have a field called security_default), and all fields have this default. If there are exceptions, I have a multiValued field called 'security_exceptions' where I comma delimit the fild name and different access permission for that field. Eg I might have 'occupation=Restricted' in that field. This falls over when I copyField fields into a text field for easier searching. Searching across multiple fields is pretty easy, too. I'd stick to that, as that also lets you assign different weight to different fields. My requirement is to offer a google-type search, so the user can type in john smith ford green and get results where ford may be a last name or a car manufacturer, or green is the colour of the car, a last name or part of a town name. If I tokenised the field values as above and copyField-ed them into a single text box, would my tokeniser pick those out? Dunno - I guess I need to roll my sleeves up and do some coding, try some of this out. Thanks again for any insights Geoff
Re: Solr vs Autonomy
Geoff, In short: all items that you listed are not a problem for Solr. Indices can be sharded, distributed search is possible, custom ranking is possible, 30 fields is possible, etc. etc. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Geoff Hopson [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, September 18, 2008 10:43:58 AM Subject: Re: Solr vs Autonomy My project is looking to index 10s of millions of documents, providing search across a live-live environment (hence index distribution/replication is important). Most searches have to be done (ie to end user) in 5 seconds or less. The index has about 30 fields, and I reckon that the security access I alluded to can be solved with field-specific queries (as opposed to a single copyFielded text field). The searches are very simple, but need to be quick. The confidence in the information is important, and so scoring is value. Faceted searches have a place too. Autonomy seems to have a solid security/access control model but offers nothing above and beyond Solr, unless I am missing something. Dunno if that helps? Geoff 2008/9/18 Walter Underwood : It depends entirely on the needs of the project. For some things, Solr is superior to Autonomy, for other things, not. I used to work at Autonomy (and Verity and Inktomi and Infoseek), and I chose Solr for Netflix. It is working great for us. wunder == Walter Underwood Former Ultraseek Architect Current Netflix Search Lead On 9/17/08 10:46 PM, Geoff Hopson wrote: Hi, I'm under pressure to justify the use of Solr on my project, and others are suggesting that Autonomy be used instead. Apart from price, does anyone have a list of pros/cons around Autonomy compared to Solr? Thanks geoff -- Light travels faster than sound. This is why some people appear bright until you hear them speak……… Mario Kart Wii: 2320 6406 5974
snapshot.yyyymmdd ... can't found them?
Hi sorry I think I've started properly rsyncd : [EMAIL PROTECTED]:/# ./data/solr/books/bin/rsyncd-enable [EMAIL PROTECTED]:/# ./data/books/video/bin/rsyncd-start but then I can't found this snapshot.current files ?? How can I check I did it properly ? my rsyncd.log : 2008/09/18 18:06:04 enabled by root 2008/09/18 18:06:04 command: /data/solr/books/bin/rsyncd-enable 2008/09/18 18:06:04 rsyncd already currently enabled 2008/09/18 18:06:04 exited (elapsed time: 0 sec) 2008/09/18 18:06:46 enabled by root 2008/09/18 18:06:46 command: ./data/solr/books/bin/rsyncd-enable 2008/09/18 18:06:46 rsyncd already currently enabled 2008/09/18 18:06:46 exited (elapsed time: 0 sec) 2008/09/18 18:07:17 started by root 2008/09/18 18:07:17 command: ./data/solr/books/bin/rsyncd-start 2008/09/18 18:07:17 [28782] connect from localhost (127.0.0.1) 2008/09/18 18:07:17 [28782] module-list request from localhost (127.0.0.1) 2008/09/18 18:07:17 rsyncd already running at port 18180 Thanks a lot, -- View this message in context: http://www.nabble.com/snapshot.mmdd-...-can%27t-found-them--tp19556507p19556507.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Setting request method to post on SolrQuery causes ClassCastException
I tried setting the 'wt' parameter to both 'xml' and 'javabin'. Neither worked. However, setting the parser on the server to XMLResponseParser did fix the problem. Thanks for the help. Susan Noble Paul നോബിള് नोब्ळ् wrote: I guess the post is not sending the correct 'wt' parameter. try setting wt=javabin explicitly . wt=xml may not work because the parser still is binary. check this http://wiki.apache.org/solr/Solrj#xmlparser On Thu, Sep 18, 2008 at 11:49 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: A quick work-around is, I think, to tell Solr to use the non-binary response, e.g. wt=xml (I think that's the syntax). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: syoung [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, September 17, 2008 7:27:30 PM Subject: Setting request method to post on SolrQuery causes ClassCastException Hi, I need to have queries over a certain length done as a post instead of a get. However, when I set the method to post, I get a ClassCastException. Here is the code: public QueryResponse query(SolrQuery solrQuery) { QueryResponse response = null; try { if (solrQuery.toString().length() MAX_URL_LENGTH) response = server.query(solrQuery, SolrRequest.METHOD.POST); else response = server.query(solrQuery, SolrRequest.METHOD.GET); } catch (SolrServerException e) { throw new DataAccessResourceFailureException(e.getMessage(), e); } return response; } And the stack trace: java.lang.ClassCastException: java.lang.String org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89) org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:385) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183) org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:113) com.localmatters.guidespot.util.SolrTemplate.query(SolrTemplate.java:33) Thanks, Susan -- View this message in context: http://www.nabble.com/Setting-request-method-to-post-on-SolrQuery-causes-ClassCastException-tp19543232p19543232.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Setting-request-method-to-post-on-SolrQuery-causes-ClassCastException-tp19543232p19557138.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Unable to filter fq param on a dynamic field
Hi Otis, no that does not seem to bring back the correct results either in fact its still zero results. Its also not bringing back results if I use the standard handler http://127.0.0.1:8080/apache-solr-1.3.0/select?q=Output-Type-facet:Monochrome but the field is visible in the documents returned if I search for the following: http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser so i know that the field is in the results generated (shown below) ?xml version=1.0 encoding=UTF-8 ? -http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser# response -http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser# lst name=responseHeader int name=status0/int int name=QTime666/int -http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser# lst name=params str name=qlaser/str /lst /lst -http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser# result name=response numFound=8056 start=0 -http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser# doc str name=CampaignCode$A/str str name=CategoryNameMono Laser Printers/str str name=CategoryPathPrinters|Mono Laser Printers/str str name=Connectivity-Technology-facetWired/str str name=CountryCodeUK/str str name=IdUK$AQ63360/str str name=MPNQ7697A#ABU/str str name=Manufacturer-facetHewlett Packard/str str name=ManufacturerNameHP/str str name=Output-Type-facetMonochrome/str str name=OverviewThe LaserJet 9000 series printer is HP's fastest, most versatile LaserJet designed for today's distr/str float name=Price1388.99/float str name=ProductCodeQ63360/str str name=ProductNameHP LASERJET 9040 MONO LASER/str int name=ReviewRating / str name=StockCodeE000/str str name=TaxCode2/str str name=Technology-facetLaser/str str name=ThumbnailURI98404.jpg/str str name=Type-facetWorkgroup printer/str str name=WebClassificationMLASERPRN/str date name=timestamp2008-09-18T16:44:01.029Z/date /doc -http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser# doc str name=CampaignCode$B/str str name=CategoryNameMono Laser Printers/str str name=CategoryPathPrinters|Mono Laser Printers/str str name=Connectivity-Technology-facetWired/str Misco is a division of Systemax Europe Ltd. Registered in Scotland Number 114143. Registered Office: Caledonian Exchange, 19a Canning Street, Edinburgh EH3 8EG. Telephone +44 (0)1933 686000.
Re: Dismax + Dynamic fields
Daniel Papasian wrote: Norberto Meijome wrote: Thanks Yonik. ok, that matches what I've seen - if i know the actual name of the field I'm after, I can use it in a query it, but i can't use the dynamic_field_name_* (with wildcard) in the config. Is adding support for this something that is desirable / needed (doable??) , and is it being worked on ? You can use a wildcard with copyFrom to copy the dynamic fields that match the pattern to another field that you can then query on. It seems like that would cover your needs, no? this is biting me right now and i don't understand how to specify the copyFrom to do what i want. i have a dynamic field declaration like: dynamicField name=*_t type=text indexed=true stored=true/ in the documents that i'm adding i am specifying location_t and group_t, for example, although i may decide to add more later - obviously that seems like the ideal use case for the dynamicField. however i cannot search these fields unless i specify them explicitly (q=location_t:something) and it doesn't work with dismax. i want all fields searchable, otherwise why would i bother with indexed=true in the dynamicField? how do i use copyFrom to search location_t, group_t and any other _t i might decide to add later? -jsd-
Re: Unable to filter fq param on a dynamic field
Barry, You are seeing the value of the field as it was saved (as the original), but perhaps something is funky with how it was analyzed/tokenized at search time and how it is being analyzed now at query time. Double-check your fieldType/analysis settings for this field and make sure you are using the same/compatible analyzers at both index and query time. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Barry Harding [EMAIL PROTECTED] To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thursday, September 18, 2008 12:53:08 PM Subject: RE: Unable to filter fq param on a dynamic field Hi Otis, no that does not seem to bring back the correct results either in fact its still zero results. Its also not bringing back results if I use the standard handler http://127.0.0.1:8080/apache-solr-1.3.0/select?q=Output-Type-facet:Monochrome but the field is visible in the documents returned if I search for the following: http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser so i know that the field is in the results generated (shown below) - - name=responseHeader 0 666 - name=params laser - name=response numFound=8056 start=0 - $A Mono Laser Printers Printers|Mono Laser Printers Wired UK UK$AQ63360 Q7697A#ABU Hewlett Packard HP Monochrome The LaserJet 9000 series printer is HP's fastest, most versatile LaserJet designed for today's distr 1388.99 Q63360 HP LASERJET 9040 MONO LASER E000 2 Laser 98404.jpg Workgroup printer MLASERPRN 2008-09-18T16:44:01.029Z - $B Mono Laser Printers Printers|Mono Laser Printers Wired Misco is a division of Systemax Europe Ltd. Registered in Scotland Number 114143. Registered Office: Caledonian Exchange, 19a Canning Street, Edinburgh EH3 8EG. Telephone +44 (0)1933 686000.
RE: Searching for future or null dates
Here is what I was able to get working with your help. (productId:(102685804)) AND liveDate:[* TO NOW] AND ((endDate:[NOW TO *]) OR ((*:* -endDate:[* TO *]))) the *:* is what I was missing. Thanks for your help. hossman wrote: : If the query stars with a negative clause Lucene returns nothing. that's not true. If a Query in lucene is a BooleanQuery that only contains negative clauses, then Lucene returns nothing (because nothing is positively selected) ... but it if there is a mix of negative lcauses and positive clauses doesn't matter what order the clauses are in. in *solr* there is code that attempts to detect a query containing purely negative clauses and it adds a MatchAllDocs query in that case -- but it only works at the top level of a query. nested queries like this... +fieldA:foo +(-fieldB:bar -fieldC:baz) ...won't work as you expect, because that nested query is only negative clauses. you can add your own MatchAllDocs query explicitly using the *:* syntax +fieldA:foo +(*:* -fieldB:bar -fieldC:baz) : endDate[NOW TO *] OR -endDate:[* TO *] side note: you really, REALLY don't wnat to mix the +/- syntax with ANT/OR/NOT .. it almost never works out the way you expect... : can I search for a date, which is either in the future OR missing : completely (meaning open ended) : : I've tried -endDate:[* TO *] OR endDate[NOW TO *] but that doesn't work. unless you've set the default op to AND this should work... fq = endDate:[NOW TO *] (*:* -endDate:[* TO *]) -Hoss -- View this message in context: http://www.nabble.com/Searching-for-future-or-%22null%22-dates-tp19502167p19563117.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filtering results
Otis, Would be reasonable to run a query like this http://localhost:8280/solr/select/?q=terms_xversion=2.2start=0rows=0indent=on 10 times, one for each result from an initial category query on a different index. So, it's still 1+10, but I'm not returning values. This would give me the number of pages that would match, and I can display that number. Not ideal, but better then nothing, and hopefully not a problem with scaling. cheers gene On Wed, Sep 17, 2008 at 1:21 PM, Gene Campbell [EMAIL PROTECTED] wrote: OK thanks Otis. Any gut feeling on the best approach to get this collapsed data? I hate to ask you to do my homework, but I'm coming to the end of my Solr/Lucene knowledge. I don't code java too well - used to, but switched to Python a while back. gene On Wed, Sep 17, 2008 at 12:47 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Gene, The latest patch from Bojan for SOLR-236 works with whatever revision of Solr he used when he made the patch. I didn't follow this thread to know your original requirements, but running 1+10 queries doesn't sound good to me from scalability/performance point of view. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: ristretto.rb [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, September 16, 2008 6:45:02 PM Subject: Re: Filtering results thanks. very interesting. The plot thickens. And, yes, I think field collapsing is exactly what I'm after. I'm am considering now trying this patch. I have a solr 1.2 instance on Jetty. I looks like I need to install the patch. Does anyone use that patch? Recommend it? The wiki page (http://wiki.apache.org/solr/FieldCollapsing) says This patch is not complete, but it will be useful to keep this page updated while the interface evolves. And the page was last updated over a year ago, so I'm not sure if that is a good. I'm trying to read through all the comments now. . I'm also considering creating a second index of just the categories which contains all the content from the main index collapsed down in to the corresponding categories - basically a complete collapsed index. Initial searches will be done against this collapsed category index, and then the first 10 results will be used to do 10 field queries against the main index to get the top records to return with each Category. Haven't decided which path to take yet. cheers gene On Wed, Sep 17, 2008 at 9:42 AM, Chris Hostetter wrote: : 1. Identify all records that would match search terms. (Suppose I : search for 'dog', and get 450,000 matches) : 2. Of those records, find the distinct list of groups over all the : matches. (Suppose there are 300.) : 3. Now get the top ranked record from each group, as if you search : just for docs in the group. this sounds similar to Field Collapsing although i don't really understand it or your specific use case enough to be certain that it's the same thing. You may find the patch, and/or the discussions about the patch useful starting points... https://issues.apache.org/jira/browse/SOLR-236 http://wiki.apache.org/solr/FieldCollapsing -Hoss
RE: Hardware config for SOLR
Matthew, Thanks, a very good point. Andrey. -Original Message- From: Matthew Runo [mailto:[EMAIL PROTECTED] Sent: Thursday, September 18, 2008 11:38 AM To: solr-user@lucene.apache.org Subject: Re: Hardware config for SOLR I can't speak to a lot of this - but regarding the servers I'd go with the more powerful ones, if only for the amount of ram. Your index will likely be larger than 1 gig, and with only two you'll have a lot of your index not stored in ram, which will slow down your QPS. Thanks for your time! Matthew Runo Software Engineer, Zappos.com [EMAIL PROTECTED] - 702-943-7833 On Sep 17, 2008, at 3:32 PM, Andrey Shulinskiy wrote: Hello, We're planning to use SOLR for our project, got some questions. So I asked some Qs yesterday, got no answers whatsoever. Wondering if they didn't make sense, or if the e-mail was too long... :-) Anyway, I'll try to ask them again and hope for some answers this time. It's a very new experience for us so any help is really appreciated. First, some numbers we're expecting. - The average size of a doc: ~100K - The number of indexes: 1 - The query response time we're looking for: 200 - 300ms - The number of stored docs: 1st year: 500K - 1M 2nd year: 2-3M - The estimated number of concurrent users per second 1st year: 15 - 25 2nd year: 40 - 60 - The estimated number of queries 1st year: 15 - 25 2nd year: 40 - 60 Now the questions 1) Should we do sharding or not? If we start without sharding, how hard will it be to enable it? Is it just some config changes + the index rebuild or is it more? My personal opinion is to go without sharding at first and enable it later if do get a lot of documents. 2) How should we organize our clusters to ensure redundancy? Should we have 2 or more identical Masters (means that all the updates/optimisations/etc. are done for every one of them)? An alternative, afaik, is to reconfigure one slave to become the new Master, how hard is that? 3) Basically, we can get servers of two kinds: * Single Processor, Dual Core Opteron 2214HE * 2 GB DDR2 SDRAM * 1 x 250 GB (7200 RPM) SATA Drive(s) * Dual Processor, Quad Core 5335 * 16 GB Memory (Fully Buffered) * 2 x 73 GB (10k RPM) 2.5 SAS Drive(s), RAID 1 The second - more powerful - one is more expensive, of course. How can we take advantage of the multiprocessor/multicore servers? Is there some special setup required to make, say, 2 instances of SOLR run on the same server using different processors/cores? 4) Does it make much difference to get a more powerful Master? Or, on the contrary, as slaves will be queried more often, they should be the better ones? Maybe just the HDDs for the slaves should be as fast as possible? 5) How many slaves does it make sense to have per one Master? What's (roughly) the performance gain from 1 to 2, 2 - 3, etc? When does it stop making sense to add more slaves? As far as I understand, it depends mainly on the size of the index. However, I'd guess the time required to do a push for too many slaves can be a problem too, correct? Thanks, Andrey.
firstSearcher and newSearcher events
Hello. I am using the spellcheck component (https://issues.apache.org/jira/browse/SOLR-572). Since the spell checker index is kept in RAM, it gets erased every time the Solr server gets restarted. I was thinking of using either the firstSearcher or the newSearcher to reload the index every time Solr starts. The events are defined as so: listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=spellchecktrue/str str name=spellcheck.dictionaryexternal/str str name=spellcheck.buildtrue/str str name=qpiza/str /lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener − arr name=queries − lst str name=qfast_warm/str str name=start0/str str name=rows10/str /lst lst str name=q static firstSearcher warming query from solrconfig.xml /str /lst lst str name=spellchecktrue/str str name=spellcheck.dictionaryexternal/str str name=spellcheck.buildtrue/str str name=qpiza/str /lst /arr /listener However the index does not load. When I check the logs I noticed the following: when the event runs the log looks like this: INFO: [] webapp=null path=null params={spellcheck=trueq=pizaspellcheck.dictionary=externalspellcheck.build=true} hits=0 status=0 QTime=1 a regular request looks like this: INFO: [] webapp=/solr path=/select/ params={spellcheck=trueq=pizaspellcheck.dictionary=externalspellcheck.build=true} hits=0 status=0 QTime=19459 I am guessing that the reason it doesn't work with the autowarm is that the webapp is null. Does anyone have any ideas what I can do to load that index in advance? -- View this message in context: http://www.nabble.com/firstSearcher-and-newSearcher-events-tp19564163p19564163.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: firstSearcher and newSearcher events
On Fri, Sep 19, 2008 at 5:55 AM, oleg_gnatovskiy [EMAIL PROTECTED] wrote: Hello. I am using the spellcheck component (https://issues.apache.org/jira/browse/SOLR-572). Since the spell checker index is kept in RAM, it gets erased every time the Solr server gets restarted. I was thinking of using either the firstSearcher or the newSearcher to reload the index every time Solr starts. This capability is already in SpellCheckComponent: http://wiki.apache.org/solr/SpellCheckComponent#onCommit -- View this message in context: http://www.nabble.com/firstSearcher-and-newSearcher-events-tp19564163p19564163.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Filtering results
Gene, I haven't looked at Field Collapsing for a while, but if you have a single index and collapse hits on your category field, then won't first 10 hits be items you are looking for - top 1 item for each category x 10 using a single query. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: ristretto.rb [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, September 18, 2008 7:35:43 PM Subject: Re: Filtering results Otis, Would be reasonable to run a query like this http://localhost:8280/solr/select/?q=terms_xversion=2.2start=0rows=0indent=on 10 times, one for each result from an initial category query on a different index. So, it's still 1+10, but I'm not returning values. This would give me the number of pages that would match, and I can display that number. Not ideal, but better then nothing, and hopefully not a problem with scaling. cheers gene On Wed, Sep 17, 2008 at 1:21 PM, Gene Campbell wrote: OK thanks Otis. Any gut feeling on the best approach to get this collapsed data? I hate to ask you to do my homework, but I'm coming to the end of my Solr/Lucene knowledge. I don't code java too well - used to, but switched to Python a while back. gene On Wed, Sep 17, 2008 at 12:47 PM, Otis Gospodnetic wrote: Gene, The latest patch from Bojan for SOLR-236 works with whatever revision of Solr he used when he made the patch. I didn't follow this thread to know your original requirements, but running 1+10 queries doesn't sound good to me from scalability/performance point of view. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: ristretto.rb To: solr-user@lucene.apache.org Sent: Tuesday, September 16, 2008 6:45:02 PM Subject: Re: Filtering results thanks. very interesting. The plot thickens. And, yes, I think field collapsing is exactly what I'm after. I'm am considering now trying this patch. I have a solr 1.2 instance on Jetty. I looks like I need to install the patch. Does anyone use that patch? Recommend it? The wiki page (http://wiki.apache.org/solr/FieldCollapsing) says This patch is not complete, but it will be useful to keep this page updated while the interface evolves. And the page was last updated over a year ago, so I'm not sure if that is a good. I'm trying to read through all the comments now. . I'm also considering creating a second index of just the categories which contains all the content from the main index collapsed down in to the corresponding categories - basically a complete collapsed index. Initial searches will be done against this collapsed category index, and then the first 10 results will be used to do 10 field queries against the main index to get the top records to return with each Category. Haven't decided which path to take yet. cheers gene On Wed, Sep 17, 2008 at 9:42 AM, Chris Hostetter wrote: : 1. Identify all records that would match search terms. (Suppose I : search for 'dog', and get 450,000 matches) : 2. Of those records, find the distinct list of groups over all the : matches. (Suppose there are 300.) : 3. Now get the top ranked record from each group, as if you search : just for docs in the group. this sounds similar to Field Collapsing although i don't really understand it or your specific use case enough to be certain that it's the same thing. You may find the patch, and/or the discussions about the patch useful starting points... https://issues.apache.org/jira/browse/SOLR-236 http://wiki.apache.org/solr/FieldCollapsing -Hoss
error when post xml data to solr
hi, all when i post an xml file to solr, some errors happen as below: == com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog at [row,col {unknown-source}]: [1,0] at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686) at com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134) at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:148) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) ) that prevented it from fulfilling this request. == i am confused: wstx jar is used for web-service, why does solr use it ? can anyone help me ? thanks.
Re: Filtering results
Thanks Otis for reply! Always appreciated! That is indeed what we are looking for implementing. But, I'm running out of time to prototype or experiment for this release. I'm going to run the two index thing for now, unless I find something saying is really easy and sensible to run one and collapse on a field. thanks gene On Fri, Sep 19, 2008 at 3:24 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Gene, I haven't looked at Field Collapsing for a while, but if you have a single index and collapse hits on your category field, then won't first 10 hits be items you are looking for - top 1 item for each category x 10 using a single query. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: ristretto.rb [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, September 18, 2008 7:35:43 PM Subject: Re: Filtering results Otis, Would be reasonable to run a query like this http://localhost:8280/solr/select/?q=terms_xversion=2.2start=0rows=0indent=on 10 times, one for each result from an initial category query on a different index. So, it's still 1+10, but I'm not returning values. This would give me the number of pages that would match, and I can display that number. Not ideal, but better then nothing, and hopefully not a problem with scaling. cheers gene On Wed, Sep 17, 2008 at 1:21 PM, Gene Campbell wrote: OK thanks Otis. Any gut feeling on the best approach to get this collapsed data? I hate to ask you to do my homework, but I'm coming to the end of my Solr/Lucene knowledge. I don't code java too well - used to, but switched to Python a while back. gene On Wed, Sep 17, 2008 at 12:47 PM, Otis Gospodnetic wrote: Gene, The latest patch from Bojan for SOLR-236 works with whatever revision of Solr he used when he made the patch. I didn't follow this thread to know your original requirements, but running 1+10 queries doesn't sound good to me from scalability/performance point of view. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: ristretto.rb To: solr-user@lucene.apache.org Sent: Tuesday, September 16, 2008 6:45:02 PM Subject: Re: Filtering results thanks. very interesting. The plot thickens. And, yes, I think field collapsing is exactly what I'm after. I'm am considering now trying this patch. I have a solr 1.2 instance on Jetty. I looks like I need to install the patch. Does anyone use that patch? Recommend it? The wiki page (http://wiki.apache.org/solr/FieldCollapsing) says This patch is not complete, but it will be useful to keep this page updated while the interface evolves. And the page was last updated over a year ago, so I'm not sure if that is a good. I'm trying to read through all the comments now. . I'm also considering creating a second index of just the categories which contains all the content from the main index collapsed down in to the corresponding categories - basically a complete collapsed index. Initial searches will be done against this collapsed category index, and then the first 10 results will be used to do 10 field queries against the main index to get the top records to return with each Category. Haven't decided which path to take yet. cheers gene On Wed, Sep 17, 2008 at 9:42 AM, Chris Hostetter wrote: : 1. Identify all records that would match search terms. (Suppose I : search for 'dog', and get 450,000 matches) : 2. Of those records, find the distinct list of groups over all the : matches. (Suppose there are 300.) : 3. Now get the top ranked record from each group, as if you search : just for docs in the group. this sounds similar to Field Collapsing although i don't really understand it or your specific use case enough to be certain that it's the same thing. You may find the patch, and/or the discussions about the patch useful starting points... https://issues.apache.org/jira/browse/SOLR-236 http://wiki.apache.org/solr/FieldCollapsing -Hoss
Can I add custom fields to the input XML file?
Hi guys. Is the XML format for inputting data, is a standard one? or can I change it. That is instead of : adddoc field name=id3007WFP/field field name=nameDell Widescreen UltraSharp 3007WFP/field field name=manuDell, Inc./field /doc/add can I enter something like, custListclients field name=id100100/field field name=propertyBPO/field field name=emp_count1500/field /clients clients field name=id100200/field field name=propertyITES/field field name=emp_count2500/field /clients /custList Thanks -- View this message in context: http://www.nabble.com/Can-I-add-custom-fields-to-the-input-XML-file--tp19566431p19566431.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Setting request method to post on SolrQuery causes ClassCastException
it is surprising as to why this happens the the javabin offers significant perf improvements over the xml one. probably you can also try this requestHandler name=/search class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=wtjavabin/str /lst /requestHandler On Thu, Sep 18, 2008 at 10:17 PM, syoung [EMAIL PROTECTED] wrote: I tried setting the 'wt' parameter to both 'xml' and 'javabin'. Neither worked. However, setting the parser on the server to XMLResponseParser did fix the problem. Thanks for the help. Susan Noble Paul നോബിള് नोब्ळ् wrote: I guess the post is not sending the correct 'wt' parameter. try setting wt=javabin explicitly . wt=xml may not work because the parser still is binary. check this http://wiki.apache.org/solr/Solrj#xmlparser On Thu, Sep 18, 2008 at 11:49 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: A quick work-around is, I think, to tell Solr to use the non-binary response, e.g. wt=xml (I think that's the syntax). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: syoung [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, September 17, 2008 7:27:30 PM Subject: Setting request method to post on SolrQuery causes ClassCastException Hi, I need to have queries over a certain length done as a post instead of a get. However, when I set the method to post, I get a ClassCastException. Here is the code: public QueryResponse query(SolrQuery solrQuery) { QueryResponse response = null; try { if (solrQuery.toString().length() MAX_URL_LENGTH) response = server.query(solrQuery, SolrRequest.METHOD.POST); else response = server.query(solrQuery, SolrRequest.METHOD.GET); } catch (SolrServerException e) { throw new DataAccessResourceFailureException(e.getMessage(), e); } return response; } And the stack trace: java.lang.ClassCastException: java.lang.String org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89) org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:385) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183) org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:113) com.localmatters.guidespot.util.SolrTemplate.query(SolrTemplate.java:33) Thanks, Susan -- View this message in context: http://www.nabble.com/Setting-request-method-to-post-on-SolrQuery-causes-ClassCastException-tp19543232p19543232.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Setting-request-method-to-post-on-SolrQuery-causes-ClassCastException-tp19543232p19557138.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul