Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature
On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese [EMAIL PROTECTED] wrote: Hey there, I have my dataimporthanlder almost completely configured. I am missing three goals. I don't think I can reach them just via xml conf or transformer and sqlEntitProcessor plugin. But need to be sure of that. If there's no other way I will hack some solr source classes, would like to know the best way to do that. Once I have it solved, I can upload or post the source in the forum in case someone think it can be helpful. 1.- Every time I execute dataimporthandler (to index data from a db), at the start time or end time I need to delete some expired documents. I have to delete them from the database and from the index. I know wich documents must be deleted because of a field in the db that says it. Would not like to delete first all from DB or first all from index but one from index and one from doc every time. You can override the init() destroy() of the SqlEntityProcessor and use it as the processor for the root entity. At this point you can run the necessary db queries and solr delete queries . look at Context#getSolrCore() and Context#getdataSource(String) The delete mark is setted as an update in the db row so I think I could use deltaImport. Don't know If deletedPkQuery is the way to do that. Can not find so much information about how to make it work. As deltaQuery modifies docs (delete old and insert new) I supose it must be a easy way to do this just doing the delete and not the new insert. deletedPkQuery does everything first. it runs the query and uses that to identify the deleted rows. 2.-This is probably my most difficult goal. Deltaimport reads a timestamp from the dataimport.properties and modify/add all documents from db wich were inserted after that date. What I want is to be able to save in the field the id of the last idexed doc. So in the next time I ejecute the indexer make it start indexing from that last indexed id doc. You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row The point of doing this is that if I do a full import from a db with lots of rows the app could encounter a problem in the middle of the execution and abort the process. As deltaquey works I would have to restart the execution from the begining. Having this new functionality I could optimize the index and start from the last indexed doc. I think I should begin modifying the SolrWriter.java and DocBuilder.java. Creating functions like getStartTime, persistStartTime... for ID control 3.-I commented before about this last point. I want to give boost to doc fields at indexing time. Adding fieldboost is a planned item. It must work as follows . Add a special value $fieldBoost.fieldname to the row map And DocBuilder should respect that. You can raise a bug and we can commit it soon. How can I do to rise a bug? https://issues.apache.org/jira/secure/CreateIssue!default.jspa Thanks in advance -- View this message in context: http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20788755.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature
Thanks I really apreciate your help. I didn't explain myself so well in here: 2.-This is probably my most difficult goal. Deltaimport reads a timestamp from the dataimport.properties and modify/add all documents from db wich were inserted after that date. What I want is to be able to save in the field the id of the last idexed doc. So in the next time I ejecute the indexer make it start indexing from that last indexed id doc. You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row When I said: be able to save in the field the id of the last idexed doc I made a mistake, wanted to mean : be able to save in the file (dataimport.properties) the id of the last indexed doc. The point would be to do my own deltaquery indexing from the last doc indexed id instead of the timestamp. So I think this would not work in that case (it's my mistake because of the bad explanation): You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row It is because I was saying: I think I should begin modifying the SolrWriter.java and DocBuilder.java. Creating functions like getStartTime, persistStartTime... for ID control I am in the correct direction? Sorry for my englis and thanks in advance Noble Paul നോബിള് नोब्ळ् wrote: On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese [EMAIL PROTECTED] wrote: Hey there, I have my dataimporthanlder almost completely configured. I am missing three goals. I don't think I can reach them just via xml conf or transformer and sqlEntitProcessor plugin. But need to be sure of that. If there's no other way I will hack some solr source classes, would like to know the best way to do that. Once I have it solved, I can upload or post the source in the forum in case someone think it can be helpful. 1.- Every time I execute dataimporthandler (to index data from a db), at the start time or end time I need to delete some expired documents. I have to delete them from the database and from the index. I know wich documents must be deleted because of a field in the db that says it. Would not like to delete first all from DB or first all from index but one from index and one from doc every time. You can override the init() destroy() of the SqlEntityProcessor and use it as the processor for the root entity. At this point you can run the necessary db queries and solr delete queries . look at Context#getSolrCore() and Context#getdataSource(String) The delete mark is setted as an update in the db row so I think I could use deltaImport. Don't know If deletedPkQuery is the way to do that. Can not find so much information about how to make it work. As deltaQuery modifies docs (delete old and insert new) I supose it must be a easy way to do this just doing the delete and not the new insert. deletedPkQuery does everything first. it runs the query and uses that to identify the deleted rows. 2.-This is probably my most difficult goal. Deltaimport reads a timestamp from the dataimport.properties and modify/add all documents from db wich were inserted after that date. What I want is to be able to save in the field the id of the last idexed doc. So in the next time I ejecute the indexer make it start indexing from that last indexed id doc. You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row The point of doing this is that if I do a full import from a db with lots of rows the app could encounter a problem in the middle of the execution and abort the process. As deltaquey works I would have to restart the execution from the begining. Having this new functionality I could optimize the index and start from the last indexed doc. I think I should begin modifying the SolrWriter.java and DocBuilder.java. Creating functions like getStartTime, persistStartTime... for ID control 3.-I commented before about this last point. I want to give boost to doc fields at indexing time. Adding fieldboost is a planned item. It must work as follows . Add a special value $fieldBoost.fieldname to the row map And DocBuilder should respect that. You can raise a bug and we can commit it soon. How can I do to rise a bug? https://issues.apache.org/jira/secure/CreateIssue!default.jspa Thanks in advance -- View this message in context: http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20788755.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20790542.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: PHPResponseWriter problem with (de)serialization (spellchecker)
On Tue, Dec 2, 2008 at 6:39 AM, Steffen B. [EMAIL PROTECTED] wrote: Little update: this behaviour can be easily reproduced with the example configuration that comes with Solr: After uncommenting line 733 in apache-solr-nightly/example/solr/conf/solrconfig.xml (which activates the PHPS queryResponseWriter) loading this URL on the example index shows the same problem: http://localhost:8983/solr/spellCheckCompRH?cmd=q=ipodspellcheck=truespellcheck.extendedResults=truespellcheck.onlyMorePopular=truewt=phps [...]s:10:spellcheck;a:1:{s:11:suggestions;a:1:{s:16:correctlySpelled;true}}} As I'm no Java crack and have neither time nor knowledge to debug the class myself, I can only offer to (re)open a task in Jira. Any opinions on this? Hi Steffen, was there previosly JIRA issue already open specifically for this issue? If not, please do open a new issue. -Yonik
Re: PHPResponseWriter problem with (de)serialization (spellchecker)
No, this issue is new. But there was a general PHPResponseWriter task... I opened a new one: https://issues.apache.org/jira/browse/SOLR-892 Feel free to move / edit / merge it. ;) I hope I made the problem clear. ~ Steffen Yonik Seeley wrote: On Tue, Dec 2, 2008 at 6:39 AM, Steffen B. [EMAIL PROTECTED] wrote: Little update: this behaviour can be easily reproduced with the example configuration that comes with Solr: After uncommenting line 733 in apache-solr-nightly/example/solr/conf/solrconfig.xml (which activates the PHPS queryResponseWriter) loading this URL on the example index shows the same problem: http://localhost:8983/solr/spellCheckCompRH?cmd=q=ipodspellcheck=truespellcheck.extendedResults=truespellcheck.onlyMorePopular=truewt=phps [...]s:10:spellcheck;a:1:{s:11:suggestions;a:1:{s:16:correctlySpelled;true}}} As I'm no Java crack and have neither time nor knowledge to debug the class myself, I can only offer to (re)open a task in Jira. Any opinions on this? Hi Steffen, was there previosly JIRA issue already open specifically for this issue? If not, please do open a new issue. -Yonik -- View this message in context: http://www.nabble.com/PHPResponseWriter-problem-with-%28de%29serialization-%28spellchecker%29-tp20703677p20792680.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: PHPResponseWriter problem with (de)serialization (spellchecker)
Little update: this behaviour can be easily reproduced with the example configuration that comes with Solr: After uncommenting line 733 in apache-solr-nightly/example/solr/conf/solrconfig.xml (which activates the PHPS queryResponseWriter) loading this URL on the example index shows the same problem: http://localhost:8983/solr/spellCheckCompRH?cmd=q=ipodspellcheck=truespellcheck.extendedResults=truespellcheck.onlyMorePopular=truewt=phps [...]s:10:spellcheck;a:1:{s:11:suggestions;a:1:{s:16:correctlySpelled;true}}} As I'm no Java crack and have neither time nor knowledge to debug the class myself, I can only offer to (re)open a task in Jira. Any opinions on this? ~ Steffen Steffen B. wrote: Hi everyone, maybe it's just me, but whenever I try to deserialize a Solr response that contains the spellchecker with spellcheck.extendedResults, it fails. I'm using PHP5 and everthing is pretty much up-to-date. lst name=spellcheck lst name=suggestions bool name=correctlySpelledtrue/bool /lst /lst will be converted to [...] s:10:spellcheck;a:1:{s:11:suggestions;a:1:{s:16:correctlySpelled;true}}} which is not deserializeable with unserialize(): Notice: unserialize() [function.unserialize]: Error at offset 305 of 312 bytes in /Solr/Client.php on line 131 PHP, on the other hand, serializes an array this way: echo serialize(array(spellcheck=array(suggestions=array(correctlySpelled=true; to a:1:{s:10:spellcheck;a:1:{s:11:suggestions;a:1:{s:16:correctlySpelled;b:1;}}} So there are obviously differences in the way boolean vars are converted, though I'm not sure if it's a problem with the PHPResponseWriter or with my setup. Can anyone confirm this behaviour? -- View this message in context: http://www.nabble.com/PHPResponseWriter-problem-with-%28de%29serialization-%28spellchecker%29-tp20703677p20790504.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dealing with field values as key/value pairs
Yeh, sorry was not clear in my question. Storage would end up being done the same way of course I guess I'm more looking for feedback about what people have used as a strategy to handle this type of situation. This goes for faceting as well. Assuming I do faceting by author and there is 2 authors with the same name. Does not work right. So discovering hot water, here, the facet value is best expressed with identifiers which would uniquely identify your author. Then you lose the 'name' and you need to effectively get it. But if you want to effectively also offer the ability to offer the name of the author in your solr response in a 'standalone' way (ie: don't rely an another source of data, like the db where is stored that mapping) ...then you need to store this data in a convenient form in the index to be able to access it later. So i'm basically looking for design pattern/best practice for that scenario based on people's experience. I was also thinking about storing each values into dynamic fields such as 'metadata_field_identifier' and then assuming I have a facet 'facet_field' which stores identifiers, use a search component to provide the mapping as an 'extra' in the response and give the mapping in another section of the response (similar to the debug, facets, etc) ie: something like: mapping: { 'field1': { 'identifier1': 'value1', 'identifier2': 'value2' }, 'field2': { 'identifierx': 'valuex', 'identifiery': 'valuey' } } does that make sense ? -- stephane Noble Paul നോബിള് नोब्ळ् wrote: In the end lucene stores stuff as strings. Even if you do store your data as map FieldType , Solr May not be able to treat it like a map. So it is fine to put is the map as one single string On Mon, Dec 1, 2008 at 10:07 PM, Stephane Bailliez [EMAIL PROTECTED] wrote: Hi all, I'm looking for ideas about how to best deal with a situation where I need to deal with storing key/values pairs in the index for consumption in the client. Typical example would be to have a document with multiple genres where for simplicity reasons i'd like to send both the 'id' and the 'human readable label' (might not be the best example since one would immediatly say 'what about localization', but in that case assume it's an entity such as company name or a person name). So say I have field1 = { 'key1':'this is value1', 'key2':'this is value2' } I was thinking the easiest (not the prettiest) solution would be to store it as effectively a string 'key:this is the value' and then have the client deal with this 'format' and then parse it based on 'key:value' pattern Another alternative I was thinking may have been to use a custom field that effectively would make the field value as a map key/value for the writer but I'm not so sure it can really be done, haven't investigated that one deeply. Any feedback would be welcome, solution might even be simpler and cleaner than what I'm mentioning above, but my brain is mushy in the last couple of weeks. -- stephane
Re: Problem indexing on Oracle DB
cool The only problem is that java.sql.Clob#getCharacterStream() is package private and you have to use the oracle.sql.CLOB On Tue, Dec 2, 2008 at 1:38 PM, Joel Karlsson [EMAIL PROTECTED] wrote: Thanks for your reply! I wrote such a transformer and now it seems to work perfectly. Here's the code for the transformer if anyone encounters the same problem, or if anyone want to improve it: import org.apache.solr.handler.dataimport.*; import oracle.sql.CLOB; import java.util.*; import java.io.*; public class ClobTransformer extends Transformer { public MapString, Object transformRow(MapString, Object row, Context context) { ListMapString, String fields = context.getAllEntityFields(); for (MapString, String field : fields) { String toString = field.get(toString); if (true.equals(toString)) { String columnName = field.get(column); CLOB clob = (CLOB)row.get(columnName); if (clob != null) { StringBuffer strOut = new StringBuffer(); String app; try { BufferedReader br = new BufferedReader(clob.getCharacterStream()); while ((app=br.readLine())!=null) strOut.append(app); } catch (Exception e) { e.printStackTrace(); } row.put(columnName, strOut.toString()); } } } return row; } } // Joel 2008/12/2 Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] Hi Joel, DIH does not translate Clob automatically to text. We can open that as an issue. meanwhile you can write a transformer of your own to read Clob and convert to text. http://wiki.apache.org/solr/DataImportHandler#head-4756038c418ab3fa389efc822277a7a789d27688 On Tue, Dec 2, 2008 at 2:57 AM, Joel Karlsson [EMAIL PROTECTED] wrote: Thanks for your reply! I'm already using the DataImportHandler for indexing. Do I still have to convert the Clob myself or are there any built-in functions that I've missed? // Joel 2008/12/1 Yonik Seeley [EMAIL PROTECTED] If you are querying Oracle yourself and using something like SolrJ, then you must convert the Clob yourself into a String representation. Also, did you look at Solr's DataImportHandler? -Yonik On Mon, Dec 1, 2008 at 3:11 PM, Joel Karlsson [EMAIL PROTECTED] wrote: Hello everyone, I'm trying to index on an Oracle DB, but can't seem to find any built in support for objects of type oracle.sql.Clob. The field I try to put the data into is of type text, but after indexing it only contains the Clob-objects string representation, i.e. something like [EMAIL PROTECTED] who knows how to get Solr to index the content of these objects rather than its string representation?? Thanks in advance! // Joel -- --Noble Paul -- --Noble Paul
Re: Problem indexing on Oracle DB
no probs We can fix that using reflection. I shall give a patch w/ that. Probably it is better to fix it in a Transformer On Tue, Dec 2, 2008 at 1:56 PM, Joel Karlsson [EMAIL PROTECTED] wrote: True, but perhaps it works with java.sql.Clob as well, haven't tried it though. 2008/12/2 Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] cool The only problem is that java.sql.Clob#getCharacterStream() is package private and you have to use the oracle.sql.CLOB On Tue, Dec 2, 2008 at 1:38 PM, Joel Karlsson [EMAIL PROTECTED] wrote: Thanks for your reply! I wrote such a transformer and now it seems to work perfectly. Here's the code for the transformer if anyone encounters the same problem, or if anyone want to improve it: import org.apache.solr.handler.dataimport.*; import oracle.sql.CLOB; import java.util.*; import java.io.*; public class ClobTransformer extends Transformer { public MapString, Object transformRow(MapString, Object row, Context context) { ListMapString, String fields = context.getAllEntityFields(); for (MapString, String field : fields) { String toString = field.get(toString); if (true.equals(toString)) { String columnName = field.get(column); CLOB clob = (CLOB)row.get(columnName); if (clob != null) { StringBuffer strOut = new StringBuffer(); String app; try { BufferedReader br = new BufferedReader(clob.getCharacterStream()); while ((app=br.readLine())!=null) strOut.append(app); } catch (Exception e) { e.printStackTrace(); } row.put(columnName, strOut.toString()); } } } return row; } } // Joel 2008/12/2 Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] Hi Joel, DIH does not translate Clob automatically to text. We can open that as an issue. meanwhile you can write a transformer of your own to read Clob and convert to text. http://wiki.apache.org/solr/DataImportHandler#head-4756038c418ab3fa389efc822277a7a789d27688 On Tue, Dec 2, 2008 at 2:57 AM, Joel Karlsson [EMAIL PROTECTED] wrote: Thanks for your reply! I'm already using the DataImportHandler for indexing. Do I still have to convert the Clob myself or are there any built-in functions that I've missed? // Joel 2008/12/1 Yonik Seeley [EMAIL PROTECTED] If you are querying Oracle yourself and using something like SolrJ, then you must convert the Clob yourself into a String representation. Also, did you look at Solr's DataImportHandler? -Yonik On Mon, Dec 1, 2008 at 3:11 PM, Joel Karlsson [EMAIL PROTECTED] wrote: Hello everyone, I'm trying to index on an Oracle DB, but can't seem to find any built in support for objects of type oracle.sql.Clob. The field I try to put the data into is of type text, but after indexing it only contains the Clob-objects string representation, i.e. something like [EMAIL PROTECTED] who knows how to get Solr to index the content of these objects rather than its string representation?? Thanks in advance! // Joel -- --Noble Paul -- --Noble Paul -- --Noble Paul
Re: Problem indexing on Oracle DB
True, but perhaps it works with java.sql.Clob as well, haven't tried it though. 2008/12/2 Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] cool The only problem is that java.sql.Clob#getCharacterStream() is package private and you have to use the oracle.sql.CLOB On Tue, Dec 2, 2008 at 1:38 PM, Joel Karlsson [EMAIL PROTECTED] wrote: Thanks for your reply! I wrote such a transformer and now it seems to work perfectly. Here's the code for the transformer if anyone encounters the same problem, or if anyone want to improve it: import org.apache.solr.handler.dataimport.*; import oracle.sql.CLOB; import java.util.*; import java.io.*; public class ClobTransformer extends Transformer { public MapString, Object transformRow(MapString, Object row, Context context) { ListMapString, String fields = context.getAllEntityFields(); for (MapString, String field : fields) { String toString = field.get(toString); if (true.equals(toString)) { String columnName = field.get(column); CLOB clob = (CLOB)row.get(columnName); if (clob != null) { StringBuffer strOut = new StringBuffer(); String app; try { BufferedReader br = new BufferedReader(clob.getCharacterStream()); while ((app=br.readLine())!=null) strOut.append(app); } catch (Exception e) { e.printStackTrace(); } row.put(columnName, strOut.toString()); } } } return row; } } // Joel 2008/12/2 Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] Hi Joel, DIH does not translate Clob automatically to text. We can open that as an issue. meanwhile you can write a transformer of your own to read Clob and convert to text. http://wiki.apache.org/solr/DataImportHandler#head-4756038c418ab3fa389efc822277a7a789d27688 On Tue, Dec 2, 2008 at 2:57 AM, Joel Karlsson [EMAIL PROTECTED] wrote: Thanks for your reply! I'm already using the DataImportHandler for indexing. Do I still have to convert the Clob myself or are there any built-in functions that I've missed? // Joel 2008/12/1 Yonik Seeley [EMAIL PROTECTED] If you are querying Oracle yourself and using something like SolrJ, then you must convert the Clob yourself into a String representation. Also, did you look at Solr's DataImportHandler? -Yonik On Mon, Dec 1, 2008 at 3:11 PM, Joel Karlsson [EMAIL PROTECTED] wrote: Hello everyone, I'm trying to index on an Oracle DB, but can't seem to find any built in support for objects of type oracle.sql.Clob. The field I try to put the data into is of type text, but after indexing it only contains the Clob-objects string representation, i.e. something like [EMAIL PROTECTED] who knows how to get Solr to index the content of these objects rather than its string representation?? Thanks in advance! // Joel -- --Noble Paul -- --Noble Paul
Multi Language Search
Hi, Before I start with Solr specific question, there is one thing I need to get information on. If I am a Russian user on a Russian Website I want to search for indexes having two Russian words how is the query term going to look like. 1. Russian Word 1 AND Russian Word 2 or rather, 2 . Russian Word 1 AND in Russian Russian Word 2 Now over to solr specific question. In case the answer to above is either 1. or 2. how does one do it using Solr. I tried using the Language anallyzers but I m not too sure how exactly it works. Regards, Tushar. -- View this message in context: http://www.nabble.com/Multi-Language-Search-tp20789025p20789025.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature
You can write the details to a file using a Transformer itself. It is wise to stick to the public API as far as possible. We will maintain back compat and your code will be usable w/ newer versions. On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese [EMAIL PROTECTED] wrote: Thanks I really apreciate your help. I didn't explain myself so well in here: 2.-This is probably my most difficult goal. Deltaimport reads a timestamp from the dataimport.properties and modify/add all documents from db wich were inserted after that date. What I want is to be able to save in the field the id of the last idexed doc. So in the next time I ejecute the indexer make it start indexing from that last indexed id doc. You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row When I said: be able to save in the field the id of the last idexed doc I made a mistake, wanted to mean : be able to save in the file (dataimport.properties) the id of the last indexed doc. The point would be to do my own deltaquery indexing from the last doc indexed id instead of the timestamp. So I think this would not work in that case (it's my mistake because of the bad explanation): You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row It is because I was saying: I think I should begin modifying the SolrWriter.java and DocBuilder.java. Creating functions like getStartTime, persistStartTime... for ID control I am in the correct direction? Sorry for my englis and thanks in advance Noble Paul നോബിള് नोब्ळ् wrote: On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese [EMAIL PROTECTED] wrote: Hey there, I have my dataimporthanlder almost completely configured. I am missing three goals. I don't think I can reach them just via xml conf or transformer and sqlEntitProcessor plugin. But need to be sure of that. If there's no other way I will hack some solr source classes, would like to know the best way to do that. Once I have it solved, I can upload or post the source in the forum in case someone think it can be helpful. 1.- Every time I execute dataimporthandler (to index data from a db), at the start time or end time I need to delete some expired documents. I have to delete them from the database and from the index. I know wich documents must be deleted because of a field in the db that says it. Would not like to delete first all from DB or first all from index but one from index and one from doc every time. You can override the init() destroy() of the SqlEntityProcessor and use it as the processor for the root entity. At this point you can run the necessary db queries and solr delete queries . look at Context#getSolrCore() and Context#getdataSource(String) The delete mark is setted as an update in the db row so I think I could use deltaImport. Don't know If deletedPkQuery is the way to do that. Can not find so much information about how to make it work. As deltaQuery modifies docs (delete old and insert new) I supose it must be a easy way to do this just doing the delete and not the new insert. deletedPkQuery does everything first. it runs the query and uses that to identify the deleted rows. 2.-This is probably my most difficult goal. Deltaimport reads a timestamp from the dataimport.properties and modify/add all documents from db wich were inserted after that date. What I want is to be able to save in the field the id of the last idexed doc. So in the next time I ejecute the indexer make it start indexing from that last indexed id doc. You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row The point of doing this is that if I do a full import from a db with lots of rows the app could encounter a problem in the middle of the execution and abort the process. As deltaquey works I would have to restart the execution from the begining. Having this new functionality I could optimize the index and start from the last indexed doc. I think I should begin modifying the SolrWriter.java and DocBuilder.java. Creating functions like getStartTime, persistStartTime... for ID control 3.-I commented before about this last point. I want to give boost to doc fields at indexing time. Adding fieldboost is a planned item. It must work as follows . Add a special value $fieldBoost.fieldname to the row map And DocBuilder should respect that. You can raise a bug and we can commit it soon. How can I do to rise a bug? https://issues.apache.org/jira/secure/CreateIssue!default.jspa Thanks in advance -- View this message in context: http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20788755.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context:
Solr 1.3 - response time very long
Hi, I tested my old search engine which is sphinx and my new one which solr and I've got a uge difference of result. How can I make it faster? Thanks a lot, -- View this message in context: http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20795134.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.3 - response time very long
On Tue, Dec 2, 2008 at 12:04 PM, sunnyfr [EMAIL PROTECTED] wrote: How can I make it faster? There's no -go-faster-please flag ;-) Give us the exact URL and we might be able to help figure out what part is slow. -Yonik
OOM on commit after few days
I have been facing this issue since long in production environment and wanted to know if anybody came across can share their thoughts. Appreciate your help. Environment 2 GB index file 3.5 million documents 15 mins. time interval for committing 100 to 400 document updates Commit happens once in 15 mins. 3.5 GB of RAM available for JVM Solr Version 1.3 ; (nightly build of oct 18, 2008) MDB - Message Driven Bean I am Not using solr's replication mecahnism. Also don't use xml post update since the amount of data is too much. I have bundled a MDB that receives messages for data updates and uses solr's update handler to update and commit the index. Optimize happens once a day. Everything runs fine for 2-3 days; after that I keep getting following exceptions. Exception org.apache.solr.common.SolrException log java.lang.OutOfMemoryError: at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:350) at org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:596) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:92) at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:907) at org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:338) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:131) at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1170) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:856) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:283) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:170) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1302) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:51) at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1128) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:284) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:665) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:690) at java.lang.Thread.run(Thread.java:810)
Encoded search string qt=Dismax
Hi, I am facing problems while searching for some encoded text as part of the search query string. The results don't come up when I use some url encoding with qt=dismaxrequest. I am searching a Russian word by posting a URL encoded UTF8 transformation of the word. The query works fine for normal request. However, no docs are fetched when qt=dismaxrequest is appended as part of the query string. The word being searched is - Russian Word - Предварительное UTF8 Java Encoding - \u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435 Posted query string (URL Encoded) - %5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435 Following are the two queries and the difference in results Query 1 - this one works fine ?q=%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435 Result - ?xml version=1.0 encoding=UTF8 ? response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=q\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435/str /lst /lst result name=response numFound=1 start=0 doc str name=Index_Type_sproductIndex/str str name=Index_Type_str_sproductIndex/str str name=URL_s4100018/str str name=URL_str_s4100018/str arr name=all strproductIndex/str strproduct/str strПредварительное K математики учебная книга/str str4100018/str str4100018/str str21125/str str91048/str str91047/str /arr str name=editionTypeId_s21125/str str name=editionTypeId_str_s21125/str arr name=listOf_taxonomyPath str91048/str str91047/str /arr str name=prdMainTitle_sПредварительное K математики учебная книга/str str name=prdMainTitle_str_sПредварительное K математики учебная книга/str str name=productType_sproduct/str str name=productType_str_sproduct/str arr name=strlistOf_taxonomyPath str91048/str str91047/str /arr date name=timestamp20081202T08:14:05.63Z/date /doc /result /response Query 2 - qt=dismaxrequest - This doesnt work ?q=%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435qt=dismaxrequest Result - ?xml version=1.0 encoding=UTF8 ? response lst name=responseHeader int name=status0/int int name=QTime109/int lst name=params str name=q\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435/str str name=qtdismaxrequest/str /lst /lst result name=response numFound=0 start=0 maxScore=0.0 / /response Dont know why there is a difference on appending qt=dismaxrequest. Any help would be appreciated. Regards, Tushar. -- View this message in context: http://www.nabble.com/Encoded--search-string---qt%3DDismax-tp20797703p20797703.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
Is there a configurable way to switch to the previous implementation? I'd like to see exactly how it affects performance in my case. Yonik Seeley wrote: And if you want to verify that the new faceting code has indeed kicked in, some statistics are logged, like: Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert INFO: UnInverted multi-valued field features, memSize=14584, time=47, phase1=47, nTerms=285, bigTerms=99, termInstances=186 -Yonik -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20797812.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OOM on commit after few days
Using embedded is always more error prone...you're probably forgetting to close some resource. Make sure to close all SolrQueryRequest objects. Start with a memory profiler or heap dump to try and figure out what's taking up all the memory. -Yonik On Tue, Dec 2, 2008 at 1:05 PM, Sunil [EMAIL PROTECTED] wrote: I have been facing this issue since long in production environment and wanted to know if anybody came across can share their thoughts. Appreciate your help. Environment 2 GB index file 3.5 million documents 15 mins. time interval for committing 100 to 400 document updates Commit happens once in 15 mins. 3.5 GB of RAM available for JVM Solr Version 1.3 ; (nightly build of oct 18, 2008) MDB - Message Driven Bean I am Not using solr's replication mecahnism. Also don't use xml post update since the amount of data is too much. I have bundled a MDB that receives messages for data updates and uses solr's update handler to update and commit the index. Optimize happens once a day. Everything runs fine for 2-3 days; after that I keep getting following exceptions. Exception org.apache.solr.common.SolrException log java.lang.OutOfMemoryError: at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:350) at org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:596) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:92) at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:907) at org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:338) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:131) at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1170) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:856) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:283) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:170) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1302) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:51) at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1128) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:284) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:665) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:690) at java.lang.Thread.run(Thread.java:810)
Re: new faceting algorithm
On Tue, Dec 2, 2008 at 1:10 PM, wojtekpia [EMAIL PROTECTED] wrote: Is there a configurable way to switch to the previous implementation? I'd like to see exactly how it affects performance in my case. Thanks for the reminder, I need to document this in the wiki. facet.method=enum (enumerate terms and do intersections, the old default) facet.method=fc (fieldcache method, the new default) -Yonik Yonik Seeley wrote: And if you want to verify that the new faceting code has indeed kicked in, some statistics are logged, like: Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert INFO: UnInverted multi-valued field features, memSize=14584, time=47, phase1=47, nTerms=285, bigTerms=99, termInstances=186 -Yonik -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20797812.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
wojtek, you can report back the numbers if possible It would be nice to know how the new impl performs in real-world On Tue, Dec 2, 2008 at 11:45 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Tue, Dec 2, 2008 at 1:10 PM, wojtekpia [EMAIL PROTECTED] wrote: Is there a configurable way to switch to the previous implementation? I'd like to see exactly how it affects performance in my case. Thanks for the reminder, I need to document this in the wiki. facet.method=enum (enumerate terms and do intersections, the old default) facet.method=fc (fieldcache method, the new default) -Yonik Yonik Seeley wrote: And if you want to verify that the new faceting code has indeed kicked in, some statistics are logged, like: Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert INFO: UnInverted multi-valued field features, memSize=14584, time=47, phase1=47, nTerms=285, bigTerms=99, termInstances=186 -Yonik -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20797812.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
RE: DataImportHandler: Deleteing from index and db; lastIndexed id feature
Does the DIH delta feature rewrite the delta-import file for each set of rows? If it does not, that sounds like a bug/enhancement. Lance -Original Message- From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2008 8:51 AM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature You can write the details to a file using a Transformer itself. It is wise to stick to the public API as far as possible. We will maintain back compat and your code will be usable w/ newer versions. On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese [EMAIL PROTECTED] wrote: Thanks I really apreciate your help. I didn't explain myself so well in here: 2.-This is probably my most difficult goal. Deltaimport reads a timestamp from the dataimport.properties and modify/add all documents from db wich were inserted after that date. What I want is to be able to save in the field the id of the last idexed doc. So in the next time I ejecute the indexer make it start indexing from that last indexed id doc. You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row When I said: be able to save in the field the id of the last idexed doc I made a mistake, wanted to mean : be able to save in the file (dataimport.properties) the id of the last indexed doc. The point would be to do my own deltaquery indexing from the last doc indexed id instead of the timestamp. So I think this would not work in that case (it's my mistake because of the bad explanation): You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row It is because I was saying: I think I should begin modifying the SolrWriter.java and DocBuilder.java. Creating functions like getStartTime, persistStartTime... for ID control I am in the correct direction? Sorry for my englis and thanks in advance Noble Paul നോബിള് नोब्ळ् wrote: On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese [EMAIL PROTECTED] wrote: Hey there, I have my dataimporthanlder almost completely configured. I am missing three goals. I don't think I can reach them just via xml conf or transformer and sqlEntitProcessor plugin. But need to be sure of that. If there's no other way I will hack some solr source classes, would like to know the best way to do that. Once I have it solved, I can upload or post the source in the forum in case someone think it can be helpful. 1.- Every time I execute dataimporthandler (to index data from a db), at the start time or end time I need to delete some expired documents. I have to delete them from the database and from the index. I know wich documents must be deleted because of a field in the db that says it. Would not like to delete first all from DB or first all from index but one from index and one from doc every time. You can override the init() destroy() of the SqlEntityProcessor and use it as the processor for the root entity. At this point you can run the necessary db queries and solr delete queries . look at Context#getSolrCore() and Context#getdataSource(String) The delete mark is setted as an update in the db row so I think I could use deltaImport. Don't know If deletedPkQuery is the way to do that. Can not find so much information about how to make it work. As deltaQuery modifies docs (delete old and insert new) I supose it must be a easy way to do this just doing the delete and not the new insert. deletedPkQuery does everything first. it runs the query and uses that to identify the deleted rows. 2.-This is probably my most difficult goal. Deltaimport reads a timestamp from the dataimport.properties and modify/add all documents from db wich were inserted after that date. What I want is to be able to save in the field the id of the last idexed doc. So in the next time I ejecute the indexer make it start indexing from that last indexed id doc. You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row The point of doing this is that if I do a full import from a db with lots of rows the app could encounter a problem in the middle of the execution and abort the process. As deltaquey works I would have to restart the execution from the begining. Having this new functionality I could optimize the index and start from the last indexed doc. I think I should begin modifying the SolrWriter.java and DocBuilder.java. Creating functions like getStartTime, persistStartTime... for ID control 3.-I commented before about this last point. I want to give boost to doc fields at indexing time. Adding fieldboost is a planned item. It must work as follows . Add a special value $fieldBoost.fieldname to the row map And DocBuilder should respect that. You can raise a bug and we can commit it soon. How can I do to rise a bug?
RE: Encoded search string qt=Dismax
Do you have a dismaxrequest request handler defined in your solr config xml? Or is it dismax? -Todd Feak -Original Message- From: tushar kapoor [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2008 10:07 AM To: solr-user@lucene.apache.org Subject: Encoded search string qt=Dismax Hi, I am facing problems while searching for some encoded text as part of the search query string. The results don't come up when I use some url encoding with qt=dismaxrequest. I am searching a Russian word by posting a URL encoded UTF8 transformation of the word. The query works fine for normal request. However, no docs are fetched when qt=dismaxrequest is appended as part of the query string. The word being searched is - Russian Word - Предварительное UTF8 Java Encoding - \u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435 Posted query string (URL Encoded) - %5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435 Following are the two queries and the difference in results Query 1 - this one works fine ?q=%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435 Result - ?xml version=1.0 encoding=UTF8 ? response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=q\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435/str /lst /lst result name=response numFound=1 start=0 doc str name=Index_Type_sproductIndex/str str name=Index_Type_str_sproductIndex/str str name=URL_s4100018/str str name=URL_str_s4100018/str arr name=all strproductIndex/str strproduct/str strПредварительное K математики учебная книга/str str4100018/str str4100018/str str21125/str str91048/str str91047/str /arr str name=editionTypeId_s21125/str str name=editionTypeId_str_s21125/str arr name=listOf_taxonomyPath str91048/str str91047/str /arr str name=prdMainTitle_sПредварительное K математики учебная книга/str str name=prdMainTitle_str_sПредварительное K математики учебная книга/str str name=productType_sproduct/str str name=productType_str_sproduct/str arr name=strlistOf_taxonomyPath str91048/str str91047/str /arr date name=timestamp20081202T08:14:05.63Z/date /doc /result /response Query 2 - qt=dismaxrequest - This doesnt work ?q=%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435qt=dismaxrequest Result - ?xml version=1.0 encoding=UTF8 ? response lst name=responseHeader int name=status0/int int name=QTime109/int lst name=params str name=q\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435/str str name=qtdismaxrequest/str /lst /lst result name=response numFound=0 start=0 maxScore=0.0 / /response Dont know why there is a difference on appending qt=dismaxrequest. Any help would be appreciated. Regards, Tushar. -- View this message in context: http://www.nabble.com/Encoded--search-string---qt%3DDismax-tp20797703p20797703.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
Definitely, but it'll take me a few days. I'll also report findings on SOLR-465. (I've been on holiday for a few weeks) Noble Paul നോബിള് नोब्ळ् wrote: wojtek, you can report back the numbers if possible It would be nice to know how the new impl performs in real-world -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20798456.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature
delta-import file? On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog [EMAIL PROTECTED] wrote: Does the DIH delta feature rewrite the delta-import file for each set of rows? If it does not, that sounds like a bug/enhancement. Lance -Original Message- From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2008 8:51 AM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature You can write the details to a file using a Transformer itself. It is wise to stick to the public API as far as possible. We will maintain back compat and your code will be usable w/ newer versions. On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese [EMAIL PROTECTED] wrote: Thanks I really apreciate your help. I didn't explain myself so well in here: 2.-This is probably my most difficult goal. Deltaimport reads a timestamp from the dataimport.properties and modify/add all documents from db wich were inserted after that date. What I want is to be able to save in the field the id of the last idexed doc. So in the next time I ejecute the indexer make it start indexing from that last indexed id doc. You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row When I said: be able to save in the field the id of the last idexed doc I made a mistake, wanted to mean : be able to save in the file (dataimport.properties) the id of the last indexed doc. The point would be to do my own deltaquery indexing from the last doc indexed id instead of the timestamp. So I think this would not work in that case (it's my mistake because of the bad explanation): You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row It is because I was saying: I think I should begin modifying the SolrWriter.java and DocBuilder.java. Creating functions like getStartTime, persistStartTime... for ID control I am in the correct direction? Sorry for my englis and thanks in advance Noble Paul നോബിള് नोब्ळ् wrote: On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese [EMAIL PROTECTED] wrote: Hey there, I have my dataimporthanlder almost completely configured. I am missing three goals. I don't think I can reach them just via xml conf or transformer and sqlEntitProcessor plugin. But need to be sure of that. If there's no other way I will hack some solr source classes, would like to know the best way to do that. Once I have it solved, I can upload or post the source in the forum in case someone think it can be helpful. 1.- Every time I execute dataimporthandler (to index data from a db), at the start time or end time I need to delete some expired documents. I have to delete them from the database and from the index. I know wich documents must be deleted because of a field in the db that says it. Would not like to delete first all from DB or first all from index but one from index and one from doc every time. You can override the init() destroy() of the SqlEntityProcessor and use it as the processor for the root entity. At this point you can run the necessary db queries and solr delete queries . look at Context#getSolrCore() and Context#getdataSource(String) The delete mark is setted as an update in the db row so I think I could use deltaImport. Don't know If deletedPkQuery is the way to do that. Can not find so much information about how to make it work. As deltaQuery modifies docs (delete old and insert new) I supose it must be a easy way to do this just doing the delete and not the new insert. deletedPkQuery does everything first. it runs the query and uses that to identify the deleted rows. 2.-This is probably my most difficult goal. Deltaimport reads a timestamp from the dataimport.properties and modify/add all documents from db wich were inserted after that date. What I want is to be able to save in the field the id of the last idexed doc. So in the next time I ejecute the indexer make it start indexing from that last indexed id doc. You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row The point of doing this is that if I do a full import from a db with lots of rows the app could encounter a problem in the middle of the execution and abort the process. As deltaquey works I would have to restart the execution from the begining. Having this new functionality I could optimize the index and start from the last indexed doc. I think I should begin modifying the SolrWriter.java and DocBuilder.java. Creating functions like getStartTime, persistStartTime... for ID control 3.-I commented before about this last point. I want to give boost to doc fields at indexing time. Adding fieldboost is a planned item. It must work as follows . Add a special value $fieldBoost.fieldname to the row map And DocBuilder should respect that. You can raise a bug and we can commit it
DataImport Hadnler - new bee question
Hey, I am trying to connect the Oracle database and index the values into solr, but I ma getting the Document [null] missing required field: id. Here is the debug output. str name=Total Requests made to DataSource1/str str name=Total Rows Fetched2/str str name=Total Documents Skipped0/str str name=Full Dump Started2008-12-02 13:49:35/str − str name= Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. /str schema.xml field name=id type=string indexed=true stored=true required=true / field name=subject type=text indexed=true stored=true omitNorms=true/ /fields uniqueKeyid/uniqueKey data-config.xml dataConfig dataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@x.x.x.x: user=... password=.../ document name=companyQAIndex entity name=companyqa pk=id query=select * from solr_test field column=id name=id / field column=text name=subject / /entity /document /dataConfig Database Schema id is the pk. There are only 2 rows in the table solr_test. Will anyone help me what I am wrong? Jae
Re: DataImport Hadnler - new bee question
I actually found the problem. Oracle returns the field name as Capital. On Tue, Dec 2, 2008 at 1:57 PM, Jae Joo [EMAIL PROTECTED] wrote: Hey, I am trying to connect the Oracle database and index the values into solr, but I ma getting the Document [null] missing required field: id. Here is the debug output. str name=Total Requests made to DataSource1/str str name=Total Rows Fetched2/str str name=Total Documents Skipped0/str str name=Full Dump Started2008-12-02 13:49:35/str − str name= Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. /str schema.xml field name=id type=string indexed=true stored=true required=true / field name=subject type=text indexed=true stored=true omitNorms=true/ /fields uniqueKeyid/uniqueKey data-config.xml dataConfig dataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@x.x.x.x: user=... password=.../ document name=companyQAIndex entity name=companyqa pk=id query=select * from solr_test field column=id name=id / field column=text name=subject / /entity /document /dataConfig Database Schema id is the pk. There are only 2 rows in the table solr_test. Will anyone help me what I am wrong? Jae
Re: Encoded search string qt=Dismax
First of all... standard request handler uses the default search field specified in your schema.xml -- dismax does not. dismax looks at the qf param to decide which fields to search for the q param. if you started with the example schema the dismax handler may have a default value for qf which is trying to query different fields then you actaully use in your documents. debugQuery=true will show you exactly what query structure (and on which fields) each request is using. Second... I don't know Russian, and character encoding issues tend to make my head spin, but the fact that the responseHeader is echoing back a q param containing java string literal sequences suggests that you are doing soemthing wrong. you should be sending the URL encoding of the actaul characters, not the URL encoding of the actual Russian word, not the URL encoding or the java string literal encoding of the Russian word. I suspect the fact that you are getting any results at all from your first query is a fluke. The str name=q in the responseHeader should show you the real word you want to search for -- once it does, then you'll know that you have the URL+UTF8 encoding issues straightened out. *THEN* i would worry about the dismax/standard behavior. : lst name=params : str : name=q\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435/str : /lst -Hoss
Re: OOM on commit after few days
Thanks Yonik The main search still happens through SolrDispatchFilter so SolrQueryRequest is getting closed implicitly. But I do use direct api in following cases; So pl suggest any more possible resource issues 1. update and commit; core.getUpdateHanlder(); Here I close the updateHandler once update/commits are done 2. Searching in other cores from current core writer I have requirement to aggregate the data from multiple indexes and send single xml response. otherCore.getSearcher() and call search method to get reference to Hits I do call decref() on refCounted once done with processing result 3. Also call reload core after commit ; This brings down the ram usage but does not solve the main issue; With the reload I don't see any leaks but the OOM error occurs after 2-3 days time. Do you think any other resource not getting closed ? Sunil --- On Tue, 12/2/08, Yonik Seeley [EMAIL PROTECTED] wrote: From: Yonik Seeley [EMAIL PROTECTED] Subject: Re: OOM on commit after few days To: solr-user@lucene.apache.org Date: Tuesday, December 2, 2008, 1:13 PM Using embedded is always more error prone...you're probably forgetting to close some resource. Make sure to close all SolrQueryRequest objects. Start with a memory profiler or heap dump to try and figure out what's taking up all the memory. -Yonik On Tue, Dec 2, 2008 at 1:05 PM, Sunil [EMAIL PROTECTED] wrote: I have been facing this issue since long in production environment and wanted to know if anybody came across can share their thoughts. Appreciate your help. Environment 2 GB index file 3.5 million documents 15 mins. time interval for committing 100 to 400 document updates Commit happens once in 15 mins. 3.5 GB of RAM available for JVM Solr Version 1.3 ; (nightly build of oct 18, 2008) MDB - Message Driven Bean I am Not using solr's replication mecahnism. Also don't use xml post update since the amount of data is too much. I have bundled a MDB that receives messages for data updates and uses solr's update handler to update and commit the index. Optimize happens once a day. Everything runs fine for 2-3 days; after that I keep getting following exceptions. Exception org.apache.solr.common.SolrException log java.lang.OutOfMemoryError: at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:350) at org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:596) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:92) at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:907) at org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:338) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:131) at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1170) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:856) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:283) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:170) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1302) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:51) at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1128) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:284) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:665) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:690) at java.lang.Thread.run(Thread.java:810)
RE: NIO not working yet
Thanks Yonik, -The next nightly build (Dec-01-2008) should have the changes. The latest nightly build seems to be 30-Nov-2008 08:20, http://people.apache.org/builds/lucene/solr/nightly/ has the version with the NIO fix been built? Are we looking in the wrong place? Tom Tom Burton-West Information Retrieval Progammer Digital Library Production Services University of Michigan Library -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Sunday, November 30, 2008 8:43 PM To: solr-user@lucene.apache.org Subject: Re: NIO not working yet OK, the development version of Solr should now be fixed (i.e. NIO should be the default for non-Windows platforms). The next nightly build (Dec-01-2008) should have the changes. -Yonik On Wed, Nov 12, 2008 at 2:59 PM, Yonik Seeley [EMAIL PROTECTED] wrote: NIO support in the latest Solr development versions does not work yet (I previously advised that some people with possible lock contention problems try it out). We'll let you know when it's fixed, but in the meantime you can always set the system property org.apache.lucene.FSDirectory.class to org.apache.lucene.store.NIOFSDirectory to try it out. for example: java -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.NIOFSDir ectory -jar start.jar -Yonik
Re: NIO not working yet
On Tue, Dec 2, 2008 at 3:41 PM, Burton-West, Tom [EMAIL PROTECTED] wrote: Thanks Yonik, -The next nightly build (Dec-01-2008) should have the changes. The latest nightly build seems to be 30-Nov-2008 08:20, http://people.apache.org/builds/lucene/solr/nightly/ has the version with the NIO fix been built? Are we looking in the wrong place? If the tests fail (which they seem to have for the last 2 days) then a new snapshot is not uploaded. Hudson also does solr builds though: http://hudson.zones.apache.org/hudson/job/Solr-trunk/lastSuccessfulBuild/artifact/trunk/dist/ -Yonik Tom Tom Burton-West Information Retrieval Progammer Digital Library Production Services University of Michigan Library -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Sunday, November 30, 2008 8:43 PM To: solr-user@lucene.apache.org Subject: Re: NIO not working yet OK, the development version of Solr should now be fixed (i.e. NIO should be the default for non-Windows platforms). The next nightly build (Dec-01-2008) should have the changes. -Yonik On Wed, Nov 12, 2008 at 2:59 PM, Yonik Seeley [EMAIL PROTECTED] wrote: NIO support in the latest Solr development versions does not work yet (I previously advised that some people with possible lock contention problems try it out). We'll let you know when it's fixed, but in the meantime you can always set the system property org.apache.lucene.FSDirectory.class to org.apache.lucene.store.NIOFSDirectory to try it out. for example: java -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.NIOFSDir ectory -jar start.jar -Yonik
SolrSharp
Anyone know the status of SolrSharp? Is it actively maintained? Thanks, Grant
Load balancing for distributed Solr
Hello all, As I understand distributed Solr, a request for a distributed search goes to a particular Solr instance with a list of arguments specifying the addresses of the shards to search. The Solr instance to which the request is first directed is responsible for distributing the query to the other shards and pulling together the results. My questions are: 1 Does it make sense to A. Always have the same Solr instance responsible for distributing the query to the other shards or B. Rotate which shard does the distributing/result aggregating? 2. For scenario A, are there different requirements (memory,cpu, processors etc) for the machine doing the distribution versus the machines hosting the shards responding to the distributed requests? 3. For scenario B, are people using some kind of load balancing to distribute which Solr instance acts as the query distributor/response aggregator? Tom Tom Burton-West Information Retrieval Programmer Digital Library Production Services University of Michigan
Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature
Do you mean the file used by dataimporthandler called dataimport.properties? If you mean this one it's writen at the end of the indexing proccess. The writen date will be used in the next indexation by delta-query to identify the new or modified rows from the database. What I am trying to do is instead of saving a timestamp save the last indexed id. Doing that, in the next execution I will start indexing from the last doc that was indexed in the previous indexation. But I am still a bit confused about how to do that... Noble Paul നോബിള് नोब्ळ् wrote: delta-import file? On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog [EMAIL PROTECTED] wrote: Does the DIH delta feature rewrite the delta-import file for each set of rows? If it does not, that sounds like a bug/enhancement. Lance -Original Message- From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2008 8:51 AM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature You can write the details to a file using a Transformer itself. It is wise to stick to the public API as far as possible. We will maintain back compat and your code will be usable w/ newer versions. On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese [EMAIL PROTECTED] wrote: Thanks I really apreciate your help. I didn't explain myself so well in here: 2.-This is probably my most difficult goal. Deltaimport reads a timestamp from the dataimport.properties and modify/add all documents from db wich were inserted after that date. What I want is to be able to save in the field the id of the last idexed doc. So in the next time I ejecute the indexer make it start indexing from that last indexed id doc. You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row When I said: be able to save in the field the id of the last idexed doc I made a mistake, wanted to mean : be able to save in the file (dataimport.properties) the id of the last indexed doc. The point would be to do my own deltaquery indexing from the last doc indexed id instead of the timestamp. So I think this would not work in that case (it's my mistake because of the bad explanation): You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row It is because I was saying: I think I should begin modifying the SolrWriter.java and DocBuilder.java. Creating functions like getStartTime, persistStartTime... for ID control I am in the correct direction? Sorry for my englis and thanks in advance Noble Paul നോബിള് नोब्ळ् wrote: On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese [EMAIL PROTECTED] wrote: Hey there, I have my dataimporthanlder almost completely configured. I am missing three goals. I don't think I can reach them just via xml conf or transformer and sqlEntitProcessor plugin. But need to be sure of that. If there's no other way I will hack some solr source classes, would like to know the best way to do that. Once I have it solved, I can upload or post the source in the forum in case someone think it can be helpful. 1.- Every time I execute dataimporthandler (to index data from a db), at the start time or end time I need to delete some expired documents. I have to delete them from the database and from the index. I know wich documents must be deleted because of a field in the db that says it. Would not like to delete first all from DB or first all from index but one from index and one from doc every time. You can override the init() destroy() of the SqlEntityProcessor and use it as the processor for the root entity. At this point you can run the necessary db queries and solr delete queries . look at Context#getSolrCore() and Context#getdataSource(String) The delete mark is setted as an update in the db row so I think I could use deltaImport. Don't know If deletedPkQuery is the way to do that. Can not find so much information about how to make it work. As deltaQuery modifies docs (delete old and insert new) I supose it must be a easy way to do this just doing the delete and not the new insert. deletedPkQuery does everything first. it runs the query and uses that to identify the deleted rows. 2.-This is probably my most difficult goal. Deltaimport reads a timestamp from the dataimport.properties and modify/add all documents from db wich were inserted after that date. What I want is to be able to save in the field the id of the last idexed doc. So in the next time I ejecute the indexer make it start indexing from that last indexed id doc. You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row The point of doing this is that if I do a full import from a db with lots of rows the app could encounter a problem in the middle of the execution and abort the process. As deltaquey works I would have to restart the execution
Re: Solr 1.3 - response time very long
Hi Matthew, Hi Yonik, ...sorry for the flag .. didnt want to ... Solr 1.3 / Apache 5.5 Data's directory size : 7.9G I'm using jMeter to hit http request, I'm sending exactly the same on solr and sphinx(mysql) by http either. solr http://test-search.com/test/selector?cache=0backend=solrrequest=/relevance/search/dog sphinx http://test-search.com/test/selector?cache=0backend=mysqlrequest=/relevance/search/dog when threads are more than 4 it's gettting slower, for a big test during 40mn with increasing to 100 threads/sec for solr like for sphinx, at the end the average for solr is 3sec and for sphinx 1sec. solrconfig.xml : http://www.nabble.com/file/p20802690/solrconf.xml solrconf.xml schema.xml: fields field name=idtype=sintindexed=true stored=true omitNorms=true / field name=duration type=sintindexed=true stored=false omitNorms=true / field name=created type=dateindexed=true stored=true omitNorms=true / field name=modified type=dateindexed=true stored=false omitNorms=true / field name=rating_binratetype=sintindexed=true stored=true omitNorms=true / field name=user_id type=sintindexed=true stored=false omitNorms=true / field name=country type=string indexed=true stored=false omitNorms=true / field name=language type=string indexed=true stored=true omitNorms=true / ... field name=stat_viewstype=sintindexed=true stored=true omitNorms=true / field name=stat_views_today type=sintindexed=true stored=false omitNorms=true / field name=stat_views_last_week type=sintindexed=true stored=false omitNorms=true / field name=stat_views_last_month type=sintindexed=true stored=false omitNorms=true / field name=stat_comments type=sintindexed=true stored=false omitNorms=true / field name=stat_comments_today type=sintindexed=true stored=false omitNorms=true / field name=stat_comments_last_week type=sintindexed=true stored=false omitNorms=true / field name=stat_comments_last_month type=sintindexed=true stored=false omitNorms=true / ... field name=title type=text indexed=true stored=true / field name=title_fr type=text_frindexed=true stored=false / field name=title_en type=text_enindexed=true stored=false / field name=title_de type=text_deindexed=true stored=false / field name=title_es type=text_esindexed=true stored=false / field name=title_ru type=text_ruindexed=true stored=false / field name=title_pt type=text_ptindexed=true stored=false / field name=title_nl type=text_nlindexed=true stored=false / field name=title_el type=text_elindexed=true stored=false / field name=title_ja type=text_jaindexed=true stored=false / field name=title_it type=text_itindexed=true stored=false / field name=description type=textindexed=true stored=true / field name=description_frtype=text_frindexed=true stored=false / field name=description_entype=text_enindexed=true stored=false / field name=description_detype=text_deindexed=true stored=false / field name=description_estype=text_esindexed=true stored=false / field name=description_rutype=text_ruindexed=true stored=false / field name=description_pttype=text_ptindexed=true stored=false / field name=description_nltype=text_nlindexed=true stored=false / field name=description_eltype=text_elindexed=true stored=false / field name=description_jatype=text_jaindexed=true stored=false / field name=text type=text indexed=true stored=false multiValued=true/ field name=timestamp type=date indexed=true stored=true default=NOW multiValued=false/ field name=spell type=textSpell indexed=true stored=true multiValued=true/ dynamicField name=random* type=random / /fields What would you reckon ??? Thanks a lot, Matthew Runo wrote: Could you provide more information? How big is the index? How are you searching it? Some examples might help pin down the issue. How long are the queries taking? How long did they take on Sphinx? Thanks for your time! Matthew Runo Software Engineer, Zappos.com [EMAIL PROTECTED] - 702-943-7833 On Dec 2, 2008, at 9:04 AM, sunnyfr wrote: Hi, I tested my old search engine which is sphinx and my new one which solr and I've got a uge difference of result. How can I make it faster? Thanks a lot, -- View this message in context:
Re: SolrSharp
I don't think it is. There is another C# client up on Google Code, but I'm not sure how well that one is maintained... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Grant Ingersoll [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, December 2, 2008 4:02:33 PM Subject: SolrSharp Anyone know the status of SolrSharp? Is it actively maintained? Thanks, Grant
Query ID range? possible?
We are using Solr and would like to know is there a query syntax to retrieve the newest x records? in decending order.? Our id field is simply that (unique id record identifier) so ideally we would want to get the last say 100 records added. Possible? Also is there a special way it needs to be defined in the schema? uniqueKeyid/uniqueKey field name=id type=text indexed=true stored=true required=true omitNorms=false / In addition, what if we want the last 100 records added (order by id desc) and another field.. say media type A for example field name=media type=string indexed=true stored=true omitNorms=true required=false/ Thanks. -Craig
Trying to exclude integer field with certain numbers
Hello, I am trying to exclude certain records from my search results in my query by specifying which ones I don't want back but its not working as expected. Here is my query: +message:test AND (-thread_id:123 OR -thread_id:456 OR -thread_id:789) So basically I just want anything back that has the word test anywhere in the message text field and does not contain the thread id 123, 456, or 789. When I execute that query I get no results back. When I just execute +message:test then I get results back and some of them with the thread ids I listed above but when I try to exclude them like that it doesn't work. Anyone have any idea how do I fix this? Thanks, - Jake C.
Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature
OK . I guess I see it. I am thinking of exposing the writes to the properties file via an API. say Context#persist(key,value); This can write the data to the dataimport.properties. You must be able to retrieve that value by ${dataimport.persist.key} or through an API, Context.getPersistValue(key) You can raise an issue and give a patch and we can get it committed I guess this is what you wish to achieve --Noble On Wed, Dec 3, 2008 at 3:28 AM, Marc Sturlese [EMAIL PROTECTED] wrote: Do you mean the file used by dataimporthandler called dataimport.properties? If you mean this one it's writen at the end of the indexing proccess. The writen date will be used in the next indexation by delta-query to identify the new or modified rows from the database. What I am trying to do is instead of saving a timestamp save the last indexed id. Doing that, in the next execution I will start indexing from the last doc that was indexed in the previous indexation. But I am still a bit confused about how to do that... Noble Paul നോബിള് नोब्ळ् wrote: delta-import file? On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog [EMAIL PROTECTED] wrote: Does the DIH delta feature rewrite the delta-import file for each set of rows? If it does not, that sounds like a bug/enhancement. Lance -Original Message- From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2008 8:51 AM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature You can write the details to a file using a Transformer itself. It is wise to stick to the public API as far as possible. We will maintain back compat and your code will be usable w/ newer versions. On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese [EMAIL PROTECTED] wrote: Thanks I really apreciate your help. I didn't explain myself so well in here: 2.-This is probably my most difficult goal. Deltaimport reads a timestamp from the dataimport.properties and modify/add all documents from db wich were inserted after that date. What I want is to be able to save in the field the id of the last idexed doc. So in the next time I ejecute the indexer make it start indexing from that last indexed id doc. You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row When I said: be able to save in the field the id of the last idexed doc I made a mistake, wanted to mean : be able to save in the file (dataimport.properties) the id of the last indexed doc. The point would be to do my own deltaquery indexing from the last doc indexed id instead of the timestamp. So I think this would not work in that case (it's my mistake because of the bad explanation): You can use a Transformer to write something to the DB. Context#getDataSource(String) for each row It is because I was saying: I think I should begin modifying the SolrWriter.java and DocBuilder.java. Creating functions like getStartTime, persistStartTime... for ID control I am in the correct direction? Sorry for my englis and thanks in advance Noble Paul നോബിള് नोब्ळ् wrote: On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese [EMAIL PROTECTED] wrote: Hey there, I have my dataimporthanlder almost completely configured. I am missing three goals. I don't think I can reach them just via xml conf or transformer and sqlEntitProcessor plugin. But need to be sure of that. If there's no other way I will hack some solr source classes, would like to know the best way to do that. Once I have it solved, I can upload or post the source in the forum in case someone think it can be helpful. 1.- Every time I execute dataimporthandler (to index data from a db), at the start time or end time I need to delete some expired documents. I have to delete them from the database and from the index. I know wich documents must be deleted because of a field in the db that says it. Would not like to delete first all from DB or first all from index but one from index and one from doc every time. You can override the init() destroy() of the SqlEntityProcessor and use it as the processor for the root entity. At this point you can run the necessary db queries and solr delete queries . look at Context#getSolrCore() and Context#getdataSource(String) The delete mark is setted as an update in the db row so I think I could use deltaImport. Don't know If deletedPkQuery is the way to do that. Can not find so much information about how to make it work. As deltaQuery modifies docs (delete old and insert new) I supose it must be a easy way to do this just doing the delete and not the new insert. deletedPkQuery does everything first. it runs the query and uses that to identify the deleted rows. 2.-This is probably my most difficult goal. Deltaimport reads a timestamp from the dataimport.properties and modify/add all documents from db wich were inserted after that date. What I
Re: Load balancing for distributed Solr
On Tue, Dec 02, 2008 at 04:34:08PM -0500, Burton-West, Tom wrote: Hello all, As I understand distributed Solr, a request for a distributed search goes to a particular Solr instance with a list of arguments specifying the addresses of the shards to search. The Solr instance to which the request is first directed is responsible for distributing the query to the other shards and pulling together the results. My questions are: 1 Does it make sense to A. Always have the same Solr instance responsible for distributing the query to the other shards or B. Rotate which shard does the distributing/result aggregating? 2. For scenario A, are there different requirements (memory,cpu, processors etc) for the machine doing the distribution versus the machines hosting the shards responding to the distributed requests? 3. For scenario B, are people using some kind of load balancing to distribute which Solr instance acts as the query distributor/response aggregator? We use Scenario B, we have 4 Solr instances (4 machines), each with N data SolrCores and 1 'default' Core which does the dispatch and aggregation of requests between the 4*N total data cores. We then use HAproxy to load balance the requests between the dispatch Cores. enjoy, -jeremy -- Jeremy Hinegardner [EMAIL PROTECTED]