Re: how to get all the docIds in the search result?
query.setRows(Integer.MAX_VALUE); Cheers Avlesh On Thu, Jul 23, 2009 at 8:15 AM, shb suh...@gmail.com wrote: When I use SolrQuery query = new SolrQuery(); query.set(q, issn:0002-9505); query.setRows(10); QueryResponse response = server.query(query); I only can get the 10 ids in the response. How can i get all the docIds in the search result? Thanks.
Re: how to get all the docIds in the search result?
if I use query.setRows(Integer.MAX_VALUE); the query will become very slow, because searcher will go to fetch the filed value in the index for all the returned document. So if I set query.setRows(10), is there any other ways to get all the ids? thanks 2009/7/23 Avlesh Singh avl...@gmail.com query.setRows(Integer.MAX_VALUE); Cheers Avlesh On Thu, Jul 23, 2009 at 8:15 AM, shb suh...@gmail.com wrote: When I use SolrQuery query = new SolrQuery(); query.set(q, issn:0002-9505); query.setRows(10); QueryResponse response = server.query(query); I only can get the 10 ids in the response. How can i get all the docIds in the search result? Thanks.
Re: how to get all the docIds in the search result?
Have you tried limiting the fields that you're requesting to just the ID? Something along the line of: query.setRows(Integer.MAX_VALUE); query.setFields(id); Might speed the query up a little. On 23 Jul 2009, at 09:11, shb wrote: Here id is indeed the uniqueKey of a document. I want to get all the ids for some other useage. 2009/7/23 Shalin Shekhar Mangar shalinman...@gmail.com On Thu, Jul 23, 2009 at 1:09 PM, shb suh...@gmail.com wrote: if I use query.setRows(Integer.MAX_VALUE); the query will become very slow, because searcher will go to fetch the filed value in the index for all the returned document. So if I set query.setRows(10), is there any other ways to get all the ids? thanks You should fetch as many rows as you need and not more. Why do you need all the ids? I'm assuming that by id you mean the uniqueKey of a document. -- Regards, Shalin Shekhar Mangar. -- Toby Cole Software Engineer, Semantico Limited toby.c...@semantico.com tel:+44 1273 358 238 Registered in England and Wales no. 03841410, VAT no. GB-744614334. Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK. Check out all our latest news and thinking on the Discovery blog http://blogs.semantico.com/discovery-blog/
Re: how to get all the docIds in the search result?
I have tried the following code: query.setRows(Integer.MAX_VALUE); query.setFields(id); when it return 1000,000 records, it will take about 22s. This is very slow. Is there any other way? 2009/7/23 Toby Cole toby.c...@semantico.com Have you tried limiting the fields that you're requesting to just the ID? Something along the line of: query.setRows(Integer.MAX_VALUE); query.setFields(id); Might speed the query up a little. On 23 Jul 2009, at 09:11, shb wrote: Here id is indeed the uniqueKey of a document. I want to get all the ids for some other useage. 2009/7/23 Shalin Shekhar Mangar shalinman...@gmail.com On Thu, Jul 23, 2009 at 1:09 PM, shb suh...@gmail.com wrote: if I use query.setRows(Integer.MAX_VALUE); the query will become very slow, because searcher will go to fetch the filed value in the index for all the returned document. So if I set query.setRows(10), is there any other ways to get all the ids? thanks You should fetch as many rows as you need and not more. Why do you need all the ids? I'm assuming that by id you mean the uniqueKey of a document. -- Regards, Shalin Shekhar Mangar. -- Toby Cole Software Engineer, Semantico Limited toby.c...@semantico.com tel:+44 1273 358 238 Registered in England and Wales no. 03841410, VAT no. GB-744614334. Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK. Check out all our latest news and thinking on the Discovery blog http://blogs.semantico.com/discovery-blog/
Index per user - thousands of indices in one Solr instance
Hi, I am new to Solr and I want to get a quick hint if it is suitable for what we want to use it for. We are building e-mail platform and we want to provide our users with full-text search functionality. We are not willing to use single index file for all users as we want to be able to migrate user index from one machine to another if need for scaling arises. As we want to have separate index file per user, single Solr instance would have to handle few thousands (or hundreds of thousands) index files (yet each quite small in size). We also need to add and remove indices online, as users register accounts or are moved to different computer in cluster. Was Solr designed with such setup in mind? I search the net but did not find such usage pattern. We can directly use Lucene and implement network layer and index replication by ourselves but it would be nice to avoid it. Best regards, Łukasz Osipiuk -- Łukasz Osipiuk mailto:luk...@osipiuk.net
Re: Index per user - thousands of indices in one Solr instance
On Thu, Jul 23, 2009 at 3:06 PM, Łukasz Osipiuk luk...@osipiuk.net wrote: I am new to Solr and I want to get a quick hint if it is suitable for what we want to use it for. We are building e-mail platform and we want to provide our users with full-text search functionality. We are not willing to use single index file for all users as we want to be able to migrate user index from one machine to another if need for scaling arises. As we want to have separate index file per user, single Solr instance would have to handle few thousands (or hundreds of thousands) index files (yet each quite small in size). We also need to add and remove indices online, as users register accounts or are moved to different computer in cluster. Was Solr designed with such setup in mind? I search the net but did not find such usage pattern. We can directly use Lucene and implement network layer and index replication by ourselves but it would be nice to avoid it. Solr was not designed with such a setup in mind. However, we are working on a similar use-case and building the additional features Solr would need. See https://issues.apache.org/jira/browse/SOLR-1293 We're planning to put up a patch soon. Perhaps we can collaborate? -- Regards, Shalin Shekhar Mangar.
Re: Highlight arbitrary text
On Tue, 21 Jul 2009 14:25:52 +0200, Anders Melchiorsen wrote: On Fri, 17 Jul 2009 16:04:24 +0200, Anders Melchiorsen wrote: However, in the normal highlighter, I am using usePhraseHighlighter and highlightMultiTerm and it seems that there is no way to turn these on in FieldAnalysisRequestHandler ? In case these options are not available with the FieldAnalysisRequestHandler, would it be simple to implement them with a plugin? The highlightMultiTerm is absolutely needed, as we use a lot of prefix searches. I tried following the FieldAnalysisRequestHandler code, but I could not find a place to plug in wildcard searching. Is it supposed to be simple (like enabling a single option somewhere), or will it need a bunch of new code? In related news, the highlighter is not exactly working correctly, because I use the PatternTokenizer for the indexed fields, and HTMLStripWhiteSpaceTokenizer obviously gives slightly different results on the presentation field. So, I tried creating my own plugin: public class HTMLStripPatternTokenizerFactory extends PatternTokenizerFactory { public TokenStream create(Reader input) { return super.create(new org.apache.solr.analysis.HTMLStripReader(input)); } } It seems to work, but is that the proper way to mix the HTML stripper and the Pattern tokenizer? Obviously, I would prefer not having to maintain a plugin, even if it is a tiny one. - Anders
Re: DataImportHandler / Import from DB : one data set comes in multiple rows
Chantal, You might consider LuSql[1]. It has much better performance than Solr DIH. It runs 4-10 times faster on a multicore machine, and can run in 1/20th the heap size Solr needs. It produces a Lucene index. See slides 22-25 in this presentation comparing Solr DIH with LuSql: http://code4lib.org/files/glen_newton_LuSql.pdf [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql Disclosure: I am the author of LuSql. Glen Newton http://zzzoot.blogspot.com/ 2009/7/22 Chantal Ackermann chantal.ackerm...@btelligent.de: Hi all, this is my first post, as I am new to SOLR (some Lucene exp). I am trying to load data from an existing datamart into SOLR using the DataImportHandler but in my opinion it is too slow due to the special structure of the datamart I have to use. Root Cause: This datamart uses a row based approach (pivot) to present its data. It was so done to allow adding more attributes to the data set without having to change the table structure. Impact: To use the DataImportHandler, i have to pivot the data to create again one row per data set. Unfortunately, this results in more and less performant queries. Moreover, there are sometimes multiple rows for a single attribute, that require separate queries - or more tricky subselects that probably don't speed things up. Here is an example of the relation between DB requests, row fetches and actual number of documents created: lst name=statusMessages str name=Total Requests made to DataSource3737/str str name=Total Rows Fetched5380/str str name=Total Documents Skipped0/str str name=Full Dump Started2009-07-22 18:19:06/str − str name= Indexing completed. Added/Updated: 934 documents. Deleted 0 documents. /str str name=Committed2009-07-22 18:22:29/str str name=Optimized2009-07-22 18:22:29/str str name=Time taken 0:3:22.484/str /lst (Full index creation.) There are about half a million data sets, in total. That would require about 30h for indexing? My feeling is that there are far too many row fetches per data set. I am testing it on a smaller machine (2GB, Windows :-( ), Tomcat6 using around 680MB RAM, Java6. I haven't changed the Lucene configuration (merge factor 10, ram buffer size 32). Possible solutions? A) Write my own DataImportHandler? B) Write my own MultiRowTransformer that accepts several rows as input argument (not sure this is a valid option)? C) Approach the DB developers to add a flat table with one data set per row? D) ...? If someone would like to share their experiences, that would be great! Thanks a lot! Chantal -- Chantal Ackermann -- -
Question re SOLR-920 Cache and reuse schema
https://issues.apache.org/jira/browse/SOLR-920 Where would the shared schema.xml be located (same as solr.xml?), and how would dynamic schema play into this? Would each core's dynamic schema still be independent?
Re: Question re SOLR-920 Cache and reuse schema
shareSchema tries to see if the schema.xml from a given file and timestamp is already loaded . if yes ,the old object is re-used. All the cores which load the same file will share a single object On Thu, Jul 23, 2009 at 3:32 PM, Brian Klippelbr...@theport.com wrote: https://issues.apache.org/jira/browse/SOLR-920 Where would the shared schema.xml be located (same as solr.xml?), and how would dynamic schema play into this? Would each core's dynamic schema still be independent? -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Question re SOLR-920 Cache and reuse schema
On Thu, Jul 23, 2009 at 3:32 PM, Brian Klippel br...@theport.com wrote: https://issues.apache.org/jira/browse/SOLR-920 and how would dynamic schema play into this? Would each core's dynamic schema still be independent? I guess you mean dynamic fields. If so, then yes, you will still be able to add values to dynamic fields for each core independently. -- Regards, Shalin Shekhar Mangar.
Re: Index per user - thousands of indices in one Solr instance
On Thu, Jul 23, 2009 at 4:30 PM, Łukasz Osipiuk luk...@osipiuk.net wrote: See https://issues.apache.org/jira/browse/SOLR-1293 We're planning to put up a patch soon. Perhaps we can collaborate? What are your estimations to have this patches ready. We have quite tight deadlines and cannot afford months of developments. If you are finishing and have some well separated tasks we certainly can help (preferably ones which does not require deep Solr internal understanding). Otherwise we will probably go for a quick hack using lucene directly. It is mostly done with some caveats (some features like alias/unalias are not supported). We've been doing extensive performance testing with this patch and we've already seen upto 5x improvement in throughput. We'll post the patch by tomorrow so you can take a look and get started. I'll also start a wiki page and document the various features, configuration options and performance benchmark results. -- Regards, Shalin Shekhar Mangar.
Re: DataImportHandler / Import from DB : one data set comes in multiple rows
Hi Paul, hi Glen, hi all, thank you for your answers. I have followed Paul's solution (as I received it earlier). (I'll keep your suggestion in mind, though, Glen.) It looks good, except that it's not creating any documents... ;-) It is most probably some misunderstanding on my side, and maybe you can help me correct that? So, I have subclassed the SqlEntityProcessor by overwriting basically nextRow() as Paul suggested: public MapString, Object nextRow() { if (rowcache != null) return getFromRowCache(); if (rowIterator == null) { String q = getQuery(); initQuery(resolver.replaceTokens(q)); } MapString, Object pivottedRow = new HashMapString, Object(); MapString, Object fieldRow = getNext(); while (fieldRow != null) { // populate pivottedRow fieldRow = getNext(); } pivottedRow = applyTransformer(pivottedRow); log.info(Returning: + pivottedRow); return pivottedRow; } This seems to work as intended. From the log output, I can see that I get only the rows that I expect for one iteration in the correct key-value structure. I can also see, that the returned pivottedRow is what I want it to be: a map containing columns where each column contains what previously was input as a row. Example (shortened): INFO: Next fieldRow: {value=2, name=audio, id=1} INFO: Next fieldRow: {value=773, name=cat, id=23} INFO: Next fieldRow: {value=642058, name=sid, id=17} INFO: Returning: {sid=642058, cat=[773], audio=2} The entity declaration in the dih config (db_data_config.xml) looks like this (shortened): entity name=my_value processor=PivotSqlEntityProcessor columnValue=value columnName=name query=select id, name, value from datamart where parent_id=${id_definition.ID} and id in (1,23,17) field column=sid name=sid / field column=audio name=audio / field column=cat name=cat / /entity id_definition is the root entity. Per parent_id there are several rows in the datamart table which describe one data set (=lucene document). The object type of value is either String, String[] or List. I am not handling that explicitly, yet. If that'd be the problem it would throw an exception, wouldn't it? But it is not creating any documents at all, although the data seems to be returned correctly from the processor, so it's pobably something far more fundamental. str name=Total Requests made to DataSource1069/str str name=Total Rows Fetched1069/str str name=Total Documents Skipped0/str str name=Full Dump Started2009-07-23 12:57:07/str − str name= Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. /str Any help / hint on what the root cause is or how to debug it would be greatly appreciated. Thank you! Chantal Noble Paul നോബിള് नोब्ळ् schrieb: alternately, you can write your own EntityProcessor and just override the nextRow() . I guess you can still use the JdbcDataSource On Wed, Jul 22, 2009 at 10:05 PM, Chantal Ackermannchantal.ackerm...@btelligent.de wrote: Hi all, this is my first post, as I am new to SOLR (some Lucene exp). I am trying to load data from an existing datamart into SOLR using the DataImportHandler but in my opinion it is too slow due to the special structure of the datamart I have to use. Root Cause: This datamart uses a row based approach (pivot) to present its data. It was so done to allow adding more attributes to the data set without having to change the table structure. Impact: To use the DataImportHandler, i have to pivot the data to create again one row per data set. Unfortunately, this results in more and less performant queries. Moreover, there are sometimes multiple rows for a single attribute, that require separate queries - or more tricky subselects that probably don't speed things up. Here is an example of the relation between DB requests, row fetches and actual number of documents created: lst name=statusMessages str name=Total Requests made to DataSource3737/str str name=Total Rows Fetched5380/str str name=Total Documents Skipped0/str str name=Full Dump Started2009-07-22 18:19:06/str - str name= Indexing completed. Added/Updated: 934 documents. Deleted 0 documents. /str str name=Committed2009-07-22 18:22:29/str str name=Optimized2009-07-22 18:22:29/str str name=Time taken 0:3:22.484/str /lst (Full index creation.) There are about half a million data sets, in total. That would require about 30h for indexing? My feeling is that there are far too many row fetches per data set. I am testing it on a smaller machine (2GB, Windows :-( ), Tomcat6 using around 680MB RAM, Java6. I haven't changed the Lucene configuration (merge factor 10, ram buffer size 32). Possible solutions? A) Write my own DataImportHandler? B) Write my own MultiRowTransformer that accepts several rows as input argument (not sure this is a valid option)? C) Approach the DB developers to add
Facet
Hi, I am new to Solr and need help with the following use case: I want to provide faceted browsing. For a given product, there are multiple descriptions (feeds, the description being 100-1500 words) that my application gets. I want to check for the presence of a fixed number of terms or attributes (5-10 attributes for a product, e.g. weight, memory etc) in the description. The attribute set will be different for each product category. And then for a given product, I wish to display the numbers of descriptions found for each attribute (the attribute text is present somewhere in the description). A description can contain more than 1 attribute. How can this be achieved? Please help. Thanks, Nishant
Re: Facet
Try out this with SolrJ SolrQuery query = new SolrQuery(); query.setQuery(q); // query.setQueryType(dismax); query.setFacet(true); query.addFacetField(id); query.addFacetField(text); query.setFacetMinCount(2); On Thu, Jul 23, 2009 at 5:12 PM, Nishant Chandra nishant.chan...@gmail.comwrote: Hi, I am new to Solr and need help with the following use case: I want to provide faceted browsing. For a given product, there are multiple descriptions (feeds, the description being 100-1500 words) that my application gets. I want to check for the presence of a fixed number of terms or attributes (5-10 attributes for a product, e.g. weight, memory etc) in the description. The attribute set will be different for each product category. And then for a given product, I wish to display the numbers of descriptions found for each attribute (the attribute text is present somewhere in the description). A description can contain more than 1 attribute. How can this be achieved? Please help. Thanks, Nishant
Re: DataImportHandler / Import from DB : one data set comes in multiple rows
Is there a uniqueKey in your schema ? are you returning a value corresponding to that key name? probably you can paste the whole data-config.xml On Thu, Jul 23, 2009 at 4:59 PM, Chantal Ackermannchantal.ackerm...@btelligent.de wrote: Hi Paul, hi Glen, hi all, thank you for your answers. I have followed Paul's solution (as I received it earlier). (I'll keep your suggestion in mind, though, Glen.) It looks good, except that it's not creating any documents... ;-) It is most probably some misunderstanding on my side, and maybe you can help me correct that? So, I have subclassed the SqlEntityProcessor by overwriting basically nextRow() as Paul suggested: public MapString, Object nextRow() { if (rowcache != null) return getFromRowCache(); if (rowIterator == null) { String q = getQuery(); initQuery(resolver.replaceTokens(q)); } MapString, Object pivottedRow = new HashMapString, Object(); MapString, Object fieldRow = getNext(); while (fieldRow != null) { // populate pivottedRow fieldRow = getNext(); } pivottedRow = applyTransformer(pivottedRow); log.info(Returning: + pivottedRow); return pivottedRow; } This seems to work as intended. From the log output, I can see that I get only the rows that I expect for one iteration in the correct key-value structure. I can also see, that the returned pivottedRow is what I want it to be: a map containing columns where each column contains what previously was input as a row. Example (shortened): INFO: Next fieldRow: {value=2, name=audio, id=1} INFO: Next fieldRow: {value=773, name=cat, id=23} INFO: Next fieldRow: {value=642058, name=sid, id=17} INFO: Returning: {sid=642058, cat=[773], audio=2} The entity declaration in the dih config (db_data_config.xml) looks like this (shortened): entity name=my_value processor=PivotSqlEntityProcessor columnValue=value columnName=name query=select id, name, value from datamart where parent_id=${id_definition.ID} and id in (1,23,17) field column=sid name=sid / field column=audio name=audio / field column=cat name=cat / /entity id_definition is the root entity. Per parent_id there are several rows in the datamart table which describe one data set (=lucene document). The object type of value is either String, String[] or List. I am not handling that explicitly, yet. If that'd be the problem it would throw an exception, wouldn't it? But it is not creating any documents at all, although the data seems to be returned correctly from the processor, so it's pobably something far more fundamental. str name=Total Requests made to DataSource1069/str str name=Total Rows Fetched1069/str str name=Total Documents Skipped0/str str name=Full Dump Started2009-07-23 12:57:07/str − str name= Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. /str Any help / hint on what the root cause is or how to debug it would be greatly appreciated. Thank you! Chantal Noble Paul നോബിള് नोब्ळ् schrieb: alternately, you can write your own EntityProcessor and just override the nextRow() . I guess you can still use the JdbcDataSource On Wed, Jul 22, 2009 at 10:05 PM, Chantal Ackermannchantal.ackerm...@btelligent.de wrote: Hi all, this is my first post, as I am new to SOLR (some Lucene exp). I am trying to load data from an existing datamart into SOLR using the DataImportHandler but in my opinion it is too slow due to the special structure of the datamart I have to use. Root Cause: This datamart uses a row based approach (pivot) to present its data. It was so done to allow adding more attributes to the data set without having to change the table structure. Impact: To use the DataImportHandler, i have to pivot the data to create again one row per data set. Unfortunately, this results in more and less performant queries. Moreover, there are sometimes multiple rows for a single attribute, that require separate queries - or more tricky subselects that probably don't speed things up. Here is an example of the relation between DB requests, row fetches and actual number of documents created: lst name=statusMessages str name=Total Requests made to DataSource3737/str str name=Total Rows Fetched5380/str str name=Total Documents Skipped0/str str name=Full Dump Started2009-07-22 18:19:06/str - str name= Indexing completed. Added/Updated: 934 documents. Deleted 0 documents. /str str name=Committed2009-07-22 18:22:29/str str name=Optimized2009-07-22 18:22:29/str str name=Time taken 0:3:22.484/str /lst (Full index creation.) There are about half a million data sets, in total. That would require about 30h for indexing? My feeling is that there are far too many row fetches per data set. I am testing it on a smaller machine (2GB, Windows :-( ), Tomcat6 using
Re: DataImportHandler / Import from DB : one data set comes in multiple rows
Hi Paul, no, I didn't return the unique key, though there is one defined. I added that to the nextRow() implementation, and I am now returning it as part of the map. But it is still not creating any documents, and now that I can see the ID I have realized that it is always processing the same - the first - data set. It's like it tries to create the first document but does not, then reiterates over that same data, fails again, and so on. I mean, it doesn't even create one document. So it cannot be a simple iteration that updates the same document over and over again (as there is none). I haven't changed the log level. I see no error message in the output (catalina.log in my case). The complete entity definition: dataConfig dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver ... / document name=doc entity name=epg_definition pk=ID query=select ID from DEFINITION !-- originally I would set the field id (unique key) on this level, doesn't work neither -- entity name=value pk=DEF_ID processor=PivotSqlEntityProcessor query=select DEF_ID, id, name, value from datamart where parent_id=${id_definition.ID} and id in (1,23,17) field column=DEF_ID name=id / field column=sid name=sid / field column=audio name=audio / field column=cat name=cat / /entity /entity /document /dataConfig schema: field name=id type=long indexed=true stored=true required=true / field name=sid type=long indexed=true stored=true required=true / field name=audio type=text_ws indexed=true stored=false omitNorms=true multiValued=true/ field name=cat type=text_ws indexed=true stored=true omitNorms=true multiValued=true/ I am using more fields, but I removed them to make it easier to read. I am thinking about removing them from my test to be sure they don't interfere. Thanks for your help! Chantal Noble Paul നോബിള് नोब्ळ् schrieb: Is there a uniqueKey in your schema ? are you returning a value corresponding to that key name? probably you can paste the whole data-config.xml On Thu, Jul 23, 2009 at 4:59 PM, Chantal Ackermannchantal.ackerm...@btelligent.de wrote: Hi Paul, hi Glen, hi all, thank you for your answers. I have followed Paul's solution (as I received it earlier). (I'll keep your suggestion in mind, though, Glen.) It looks good, except that it's not creating any documents... ;-) It is most probably some misunderstanding on my side, and maybe you can help me correct that? So, I have subclassed the SqlEntityProcessor by overwriting basically nextRow() as Paul suggested: public MapString, Object nextRow() { if (rowcache != null) return getFromRowCache(); if (rowIterator == null) { String q = getQuery(); initQuery(resolver.replaceTokens(q)); } MapString, Object pivottedRow = new HashMapString, Object(); MapString, Object fieldRow = getNext(); while (fieldRow != null) { // populate pivottedRow fieldRow = getNext(); } pivottedRow = applyTransformer(pivottedRow); log.info(Returning: + pivottedRow); return pivottedRow; } This seems to work as intended. From the log output, I can see that I get only the rows that I expect for one iteration in the correct key-value structure. I can also see, that the returned pivottedRow is what I want it to be: a map containing columns where each column contains what previously was input as a row. Example (shortened): INFO: Next fieldRow: {value=2, name=audio, id=1} INFO: Next fieldRow: {value=773, name=cat, id=23} INFO: Next fieldRow: {value=642058, name=sid, id=17} INFO: Returning: {sid=642058, cat=[773], audio=2} The entity declaration in the dih config (db_data_config.xml) looks like this (shortened): entity name=my_value processor=PivotSqlEntityProcessor columnValue=value columnName=name query=select id, name, value from datamart where parent_id=${id_definition.ID} and id in (1,23,17) field column=sid name=sid / field column=audio name=audio / field column=cat name=cat / /entity id_definition is the root entity. Per parent_id there are several rows in the datamart table which describe one data set (=lucene document). The object type of value is either String, String[] or List. I am not handling that explicitly, yet. If that'd be the problem it would throw an exception, wouldn't it? But it is not creating any documents at all, although the data seems to be returned correctly from the processor, so it's pobably something far more fundamental. str name=Total Requests made to DataSource1069/str str name=Total Rows Fetched1069/str str name=Total Documents Skipped0/str str name=Full Dump Started2009-07-23 12:57:07/str − str name= Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. /str Any help / hint on what the root
Re: DataImportHandler / Import from DB : one data set comes in multiple rows
Note that the statement about LuSql (or really any other tool, LuSql is just an example because it was mentioned) is true only if Solr is underutilized because DIH uses a single thread to talk to Solr (is this correct?) vs. LuSql using multiple (I'm guessing that's the case becase of the multicore comment). But, if the DB itself if your bottleneck, and I've seen plenty of such cases, then speed of DIH vs. LuSql vs. something else matters less. Glen, please correct me if I'm wrong about this - I know you have done plenty of benchmarking. :) Otis -- Sematext is hiring: http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Glen Newton glen.new...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, July 23, 2009 5:52:43 AM Subject: Re: DataImportHandler / Import from DB : one data set comes in multiple rows Chantal, You might consider LuSql[1]. It has much better performance than Solr DIH. It runs 4-10 times faster on a multicore machine, and can run in 1/20th the heap size Solr needs. It produces a Lucene index. See slides 22-25 in this presentation comparing Solr DIH with LuSql: http://code4lib.org/files/glen_newton_LuSql.pdf [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql Disclosure: I am the author of LuSql. Glen Newton http://zzzoot.blogspot.com/ 2009/7/22 Chantal Ackermann : Hi all, this is my first post, as I am new to SOLR (some Lucene exp). I am trying to load data from an existing datamart into SOLR using the DataImportHandler but in my opinion it is too slow due to the special structure of the datamart I have to use. Root Cause: This datamart uses a row based approach (pivot) to present its data. It was so done to allow adding more attributes to the data set without having to change the table structure. Impact: To use the DataImportHandler, i have to pivot the data to create again one row per data set. Unfortunately, this results in more and less performant queries. Moreover, there are sometimes multiple rows for a single attribute, that require separate queries - or more tricky subselects that probably don't speed things up. Here is an example of the relation between DB requests, row fetches and actual number of documents created: 3737 5380 0 2009-07-22 18:19:06 − Indexing completed. Added/Updated: 934 documents. Deleted 0 documents. 2009-07-22 18:22:29 2009-07-22 18:22:29 0:3:22.484 (Full index creation.) There are about half a million data sets, in total. That would require about 30h for indexing? My feeling is that there are far too many row fetches per data set. I am testing it on a smaller machine (2GB, Windows :-( ), Tomcat6 using around 680MB RAM, Java6. I haven't changed the Lucene configuration (merge factor 10, ram buffer size 32). Possible solutions? A) Write my own DataImportHandler? B) Write my own MultiRowTransformer that accepts several rows as input argument (not sure this is a valid option)? C) Approach the DB developers to add a flat table with one data set per row? D) ...? If someone would like to share their experiences, that would be great! Thanks a lot! Chantal -- Chantal Ackermann -- -
Re: how to get all the docIds in the search result?
You could pull the ID directly from the Lucene index, that may be a little faster. You can also use Lucene's TermEnum to get to this. And you should make sure that id field is the first field in your documents (when you index them). But no matter what you do, this will not be subsecond for non-trivial indices - it's the equivalent of a full table scan in RDBMS world. Otis -- Sematext is hiring: http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: shb suh...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, July 23, 2009 5:35:29 AM Subject: Re: how to get all the docIds in the search result? I have tried the following code: query.setRows(Integer.MAX_VALUE); query.setFields(id); when it return 1000,000 records, it will take about 22s. This is very slow. Is there any other way? 2009/7/23 Toby Cole Have you tried limiting the fields that you're requesting to just the ID? Something along the line of: query.setRows(Integer.MAX_VALUE); query.setFields(id); Might speed the query up a little. On 23 Jul 2009, at 09:11, shb wrote: Here id is indeed the uniqueKey of a document. I want to get all the ids for some other useage. 2009/7/23 Shalin Shekhar Mangar On Thu, Jul 23, 2009 at 1:09 PM, shb wrote: if I use query.setRows(Integer.MAX_VALUE); the query will become very slow, because searcher will go to fetch the filed value in the index for all the returned document. So if I set query.setRows(10), is there any other ways to get all the ids? thanks You should fetch as many rows as you need and not more. Why do you need all the ids? I'm assuming that by id you mean the uniqueKey of a document. -- Regards, Shalin Shekhar Mangar. -- Toby Cole Software Engineer, Semantico Limited Registered in England and Wales no. 03841410, VAT no. GB-744614334. Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK. Check out all our latest news and thinking on the Discovery blog http://blogs.semantico.com/discovery-blog/
Sort field
Hallo... I have a problem... i want to sort a field at the Moment the field type is text, but i have test it with string or date the content of the field looks like 22.07.09 it is a Date. when i sort, i get : failed to open stream: HTTP request failed! HTTP/1.1 500 there_are_more_terms_than_documents_in_field_ERP_ERP_FILE_CONTENT_DATUM_but_its_impossible_to_sort_on_tokenized_f in */var/www/search.php* on line *23 What happen? *
Re: Sort field
On Jul 23, 2009, at 11:03 AM, Jörg Agatz wrote: Hallo... I have a problem... i want to sort a field at the Moment the field type is text, but i have test it with string or date the content of the field looks like 22.07.09 it is a Date. when i sort, i get : failed to open stream: HTTP request failed! HTTP/1.1 500 there_are_more_terms_than_documents_in_field_ERP_ERP_FILE_CONTENT_DATUM_but_its_impossible_to_sort_on_tokenized_f in */var/www/search.php* on line *23 What happen? You have to sort on a field that only has a single indexed term per document. A string with indexed=true is one option. Use copyField to copy your text field to a string version if it is as simple as that for your sorting needs. Erik
Re: how to get all the docIds in the search result?
Rather than trying to get all document id's in one call to Solr, consider paging through the results. Set rows=1000 or probably larger, then check the numFound and continue making requests to Solr incrementing start parameter accordingly until done. Erik On Jul 23, 2009, at 5:35 AM, shb wrote: I have tried the following code: query.setRows(Integer.MAX_VALUE); query.setFields(id); when it return 1000,000 records, it will take about 22s. This is very slow. Is there any other way? 2009/7/23 Toby Cole toby.c...@semantico.com Have you tried limiting the fields that you're requesting to just the ID? Something along the line of: query.setRows(Integer.MAX_VALUE); query.setFields(id); Might speed the query up a little. On 23 Jul 2009, at 09:11, shb wrote: Here id is indeed the uniqueKey of a document. I want to get all the ids for some other useage. 2009/7/23 Shalin Shekhar Mangar shalinman...@gmail.com On Thu, Jul 23, 2009 at 1:09 PM, shb suh...@gmail.com wrote: if I use query.setRows(Integer.MAX_VALUE); the query will become very slow, because searcher will go to fetch the filed value in the index for all the returned document. So if I set query.setRows(10), is there any other ways to get all the ids? thanks You should fetch as many rows as you need and not more. Why do you need all the ids? I'm assuming that by id you mean the uniqueKey of a document. -- Regards, Shalin Shekhar Mangar. -- Toby Cole Software Engineer, Semantico Limited toby.c...@semantico.com tel:+44 1273 358 238 Registered in England and Wales no. 03841410, VAT no. GB-744614334. Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK. Check out all our latest news and thinking on the Discovery blog http://blogs.semantico.com/discovery-blog/
Re: DataImportHandler / Import from DB : one data set comes in multiple rows
Hi Otis, Yes, you are right: LuSql is heavily optimized for multi-thread/multi-core. It also performs better on a single core with multiple threads, due to the heavy i/o bounded nature of Lucene indexing. So if the DB is the bottleneck, well, yes, then LuSql and any other tool are not going help. Resolve the DB bottleneck, and then decide what tool best serves your indexing requirements. Only slightly off topic: I have noticed one problem with DBs (with LuSql and custom JDBC clients processing records) when the fetch size is too large and the amount of processsing of each record gets too large: sometimes the connection times out because the time between getting the next batch takes too long (due to the accumulated delay from processing all the records). Solved with reducing the fetch size. I am not sure if Solr/DIH users have experienced this. LuSql allows setting the fetch size (like DIH I believe) and (unreleased version) re-issues the SQL and offsets to the last+1 record when this happens. -glen 2009/7/23 Otis Gospodnetic otis_gospodne...@yahoo.com: Note that the statement about LuSql (or really any other tool, LuSql is just an example because it was mentioned) is true only if Solr is underutilized because DIH uses a single thread to talk to Solr (is this correct?) vs. LuSql using multiple (I'm guessing that's the case becase of the multicore comment). But, if the DB itself if your bottleneck, and I've seen plenty of such cases, then speed of DIH vs. LuSql vs. something else matters less. Glen, please correct me if I'm wrong about this - I know you have done plenty of benchmarking. :) Otis -- Sematext is hiring: http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Glen Newton glen.new...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, July 23, 2009 5:52:43 AM Subject: Re: DataImportHandler / Import from DB : one data set comes in multiple rows Chantal, You might consider LuSql[1]. It has much better performance than Solr DIH. It runs 4-10 times faster on a multicore machine, and can run in 1/20th the heap size Solr needs. It produces a Lucene index. See slides 22-25 in this presentation comparing Solr DIH with LuSql: http://code4lib.org/files/glen_newton_LuSql.pdf [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql Disclosure: I am the author of LuSql. Glen Newton http://zzzoot.blogspot.com/ 2009/7/22 Chantal Ackermann : Hi all, this is my first post, as I am new to SOLR (some Lucene exp). I am trying to load data from an existing datamart into SOLR using the DataImportHandler but in my opinion it is too slow due to the special structure of the datamart I have to use. Root Cause: This datamart uses a row based approach (pivot) to present its data. It was so done to allow adding more attributes to the data set without having to change the table structure. Impact: To use the DataImportHandler, i have to pivot the data to create again one row per data set. Unfortunately, this results in more and less performant queries. Moreover, there are sometimes multiple rows for a single attribute, that require separate queries - or more tricky subselects that probably don't speed things up. Here is an example of the relation between DB requests, row fetches and actual number of documents created: 3737 5380 0 2009-07-22 18:19:06 − Indexing completed. Added/Updated: 934 documents. Deleted 0 documents. 2009-07-22 18:22:29 2009-07-22 18:22:29 0:3:22.484 (Full index creation.) There are about half a million data sets, in total. That would require about 30h for indexing? My feeling is that there are far too many row fetches per data set. I am testing it on a smaller machine (2GB, Windows :-( ), Tomcat6 using around 680MB RAM, Java6. I haven't changed the Lucene configuration (merge factor 10, ram buffer size 32). Possible solutions? A) Write my own DataImportHandler? B) Write my own MultiRowTransformer that accepts several rows as input argument (not sure this is a valid option)? C) Approach the DB developers to add a flat table with one data set per row? D) ...? If someone would like to share their experiences, that would be great! Thanks a lot! Chantal -- Chantal Ackermann -- - -- -
Re: excluding certain terms from facet counts when faceting based on indexed terms of a field
I want to exclude a very small number of terms which will be different for each query. So I think my best bet is to use localParam. Bill On Wed, Jul 22, 2009 at 4:16 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I am faceting based on the indexed terms of a field by using facet.field. : Is there any way to exclude certain terms from the facet counts? if you're talking about a lot of terms, and they're going to be hte same for *all* queries, the best appraoch is to strip them out when indexing (StopWordFilter is your freind) -Hoss
Re: excluding certain terms from facet counts when faceting based on indexed terms of a field
Give it is a small number of terms, seems like just excluding them from use/visibility on the client would be reasonable. Erik On Jul 23, 2009, at 11:43 AM, Bill Au wrote: I want to exclude a very small number of terms which will be different for each query. So I think my best bet is to use localParam. Bill On Wed, Jul 22, 2009 at 4:16 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I am faceting based on the indexed terms of a field by using facet.field. : Is there any way to exclude certain terms from the facet counts? if you're talking about a lot of terms, and they're going to be hte same for *all* queries, the best appraoch is to strip them out when indexing (StopWordFilter is your freind) -Hoss
Re: excluding certain terms from facet counts when faceting based on indexed terms of a field
That's actually what we have been doing. I was just wondering if there is any way to move this work from the client back into Solr. Bill On Thu, Jul 23, 2009 at 11:47 AM, Erik Hatcher e...@ehatchersolutions.comwrote: Give it is a small number of terms, seems like just excluding them from use/visibility on the client would be reasonable. Erik On Jul 23, 2009, at 11:43 AM, Bill Au wrote: I want to exclude a very small number of terms which will be different for each query. So I think my best bet is to use localParam. Bill On Wed, Jul 22, 2009 at 4:16 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I am faceting based on the indexed terms of a field by using facet.field. : Is there any way to exclude certain terms from the facet counts? if you're talking about a lot of terms, and they're going to be hte same for *all* queries, the best appraoch is to strip them out when indexing (StopWordFilter is your freind) -Hoss
Re: how to get all the docIds in the search result?
And if I may add another thing - if you are using Solr in this fashion, have a look at your caches, esp. document cache. If your queries of this type are repeated, you may benefit from large cache. Or, if they are not, you may completely disable some caches. Otis -- Sematext is hiring: http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Erik Hatcher e...@ehatchersolutions.com To: solr-user@lucene.apache.org Sent: Thursday, July 23, 2009 11:15:45 AM Subject: Re: how to get all the docIds in the search result? Rather than trying to get all document id's in one call to Solr, consider paging through the results. Set rows=1000 or probably larger, then check the numFound and continue making requests to Solr incrementing start parameter accordingly until done. Erik On Jul 23, 2009, at 5:35 AM, shb wrote: I have tried the following code: query.setRows(Integer.MAX_VALUE); query.setFields(id); when it return 1000,000 records, it will take about 22s. This is very slow. Is there any other way? 2009/7/23 Toby Cole Have you tried limiting the fields that you're requesting to just the ID? Something along the line of: query.setRows(Integer.MAX_VALUE); query.setFields(id); Might speed the query up a little. On 23 Jul 2009, at 09:11, shb wrote: Here id is indeed the uniqueKey of a document. I want to get all the ids for some other useage. 2009/7/23 Shalin Shekhar Mangar On Thu, Jul 23, 2009 at 1:09 PM, shb wrote: if I use query.setRows(Integer.MAX_VALUE); the query will become very slow, because searcher will go to fetch the filed value in the index for all the returned document. So if I set query.setRows(10), is there any other ways to get all the ids? thanks You should fetch as many rows as you need and not more. Why do you need all the ids? I'm assuming that by id you mean the uniqueKey of a document. -- Regards, Shalin Shekhar Mangar. -- Toby Cole Software Engineer, Semantico Limited Registered in England and Wales no. 03841410, VAT no. GB-744614334. Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK. Check out all our latest news and thinking on the Discovery blog http://blogs.semantico.com/discovery-blog/
Re: Storing string field in solr.ExternalFieldFile type
Thanks for the response, Eric. We have seen that size of the index has a direct impact on the search speed, especially when the index size is in GBs, so trying all possible ways to keep the index size as low as we can. We thought solr.ExternalFileField type would help to keep the index size low by storing a text field out side of the index. Here's what we were planning: initially, all the fields except the solr.ExternalFileField type field will be queried and will be displayed to the end user. . There will be subsequent calls from the UI to pull the solr.ExternalFileField field that will be loaded in a lazy manner. However, realized that solr.ExternalFileField only supports float type, however, the data that we're planning to keep as an external field is a string type. Thanks, -Jibo On Jul 22, 2009, at 1:46 PM, Erick Erickson wrote: Hoping the experts chime in if I'm wrong, but As far as I know, while storing a field increases the size of an index, it doesn't have much impact on the search speed. Which you could pretty easily test by creating the index both ways and firing off some timing queries and comparing. Although it would be time consuming... I believe there's some info on the Lucene Wiki about this, but my memory isn't what it used to be. Erick On Tue, Jul 21, 2009 at 2:42 PM, Jibo John jiboj...@mac.com wrote: We're in the process of building a log searcher application. In order to reduce the index size to improve the query performance, we're exploring the possibility of having: 1. One field for each log line with 'indexed=true stored=false' that will be used for searching 2. Another field for each log line of type solr.ExternalFileField that will be used only for display purpose. We realized that currently solr.ExternalFileField supports only float type. Is there a way we can override this to support string type? Any issues with this approach? Any ideas are welcome. Thanks, -Jibo
Re: Storing string field in solr.ExternalFieldFile type
I'm not sure if there is a lot of benefit from storing the literal values in that external file vs. directly in the index. There are a number of things one should look at first, as far as performance is concerned - JVM settings, cache sizes, analysis, etc. For example, I have one index here that is 9 times the size of the original data because of how its fields are analyzed. I can change one analysis-level setting and make that ratio go down to 2. So I'd look at other, more straight forward things first. There is a Wiki page either on Solr or Lucene Wiki dedicated to various search performance tricks. Otis -- Sematext is hiring: http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Jibo John jiboj...@mac.com To: solr-user@lucene.apache.org Sent: Thursday, July 23, 2009 12:08:26 PM Subject: Re: Storing string field in solr.ExternalFieldFile type Thanks for the response, Eric. We have seen that size of the index has a direct impact on the search speed, especially when the index size is in GBs, so trying all possible ways to keep the index size as low as we can. We thought solr.ExternalFileField type would help to keep the index size low by storing a text field out side of the index. Here's what we were planning: initially, all the fields except the solr.ExternalFileField type field will be queried and will be displayed to the end user. . There will be subsequent calls from the UI to pull the solr.ExternalFileField field that will be loaded in a lazy manner. However, realized that solr.ExternalFileField only supports float type, however, the data that we're planning to keep as an external field is a string type. Thanks, -Jibo On Jul 22, 2009, at 1:46 PM, Erick Erickson wrote: Hoping the experts chime in if I'm wrong, but As far as I know, while storing a field increases the size of an index, it doesn't have much impact on the search speed. Which you could pretty easily test by creating the index both ways and firing off some timing queries and comparing. Although it would be time consuming... I believe there's some info on the Lucene Wiki about this, but my memory isn't what it used to be. Erick On Tue, Jul 21, 2009 at 2:42 PM, Jibo John wrote: We're in the process of building a log searcher application. In order to reduce the index size to improve the query performance, we're exploring the possibility of having: 1. One field for each log line with 'indexed=true stored=false' that will be used for searching 2. Another field for each log line of type solr.ExternalFileField that will be used only for display purpose. We realized that currently solr.ExternalFileField supports only float type. Is there a way we can override this to support string type? Any issues with this approach? Any ideas are welcome. Thanks, -Jibo
index backup works only if there are committed index
Hi, I noticed that the backup request http://master_host:port/solr/replication?command=backuphttp://master_host/solr/replication?command=backup works only if there are committed index data, i.e. core.getDeletionPolicy().getLatestCommit() is not null. Otherwise, no backup is created. It sounds logical because if nothing has been committed since your last backup, it doesn't help much to do a new backup. However, consider this scenario: 1. a backup process is scheduled at 1:00AM every Monday 2. just before 1:00AM, the system is shutdown (for whatever reason), and then restarts 3. No index is committed before 1:00AM 4. at 1:00AM, backup process starts and no committed index is found, and therefore no backup (until next week) The probability of this scenario is probably small, but it still could happen, and it seems to me that if I want to backup index, a backup should be created whether there are new committed index or not. Your thoughts? Thanks, -- J
Re: Solr and UIMA
On Jul 21, 2009, at 11:57 AM, JCodina wrote: Hello, Grant, there are two ways, to implement this, one is payloads, and the other one is multiple tokens at the same positions. Each of them can be useful, let me explain the way I thick they can be used. Payloads : every token has extra information that can be used in the processing , for example if I can add Part-of-speech then I can develop tokenizers that take into account the POS (or for example I can generate bigrams of Noum Adjective, or Noum prep Noum i can have a better stopwords algorithm) Multiple tokes in one position: If I can have different tokens at the same place, I can have different informations like: was #verb _be so I can do a search for you _be #adjective to find all the sentences that talk about you for example you were clever you are tall .. This was one of the use cases for payloads as well, but it likely needs more Query support at the moment, as the BoostingTermQuery would only allow you to boost values where it's a verb, not include/exclude. I have not understood the way that the DelimitedPayloadTokenFilterFactory may work in solr, which is the input format? the DPTFF (nice acronym, eh?) allows you to send in your normal Solr XML, but with payloads encoded in the text. For instance: field name=foothe quick|JJ red|JJ fox|NN jumped|VB over the lazy| JJ brown|JJ dogs|NN/field The DPTFF will take the value before the delimiter as the Token and the value after the delimiter as the payload. This then allows Solr to add Payloads without modifying a single thing in Solr, at least on the indexing side. so I was thinking in generating an xml where for each token a single string is generated like was#verb#be and then there is a tokenfilter that splits by # each white space separated string, in this case in three words and adds the trailing character that allows to search for the right semantic info. But gives them the same increment. Of course the full processing chain must be aware of this. But I must think on multiwords tokens We could likely make a generic TokenFilter that can capture both multiple tokens and payloads all at the same time, simply by allowing it to have to attributes: 1. token delimiter (#) 2. payload delimiter (|) Then, you could do something like: was#be|verb or was#be|0.3 where was and be are both tokens at the same position and verb or 0.3 are payloads on those tokens. This is a nearly trivial variation of the DelimitedPayloadTokenFilter Grant Ingersoll-6 wrote: On Jul 20, 2009, at 6:43 AM, JCodina wrote: D: Break things down. The CAS would only produce XML that solr can process. Then different Tokenizers can be used to deal with the data in the CAS. the main point is that the XML has the doc and field labels of solr. I just committed the DelimitedPayloadTokenFilterFactory, I suspect this is along the lines of what you are thinking, but I haven't done all that much with UIMA. I also suspect the Tee/Sink capabilities of Lucene could be helpful, but they aren't available in Solr yet. E: The set of capabilities to process the xml is defined in XML, similar to lucas to define the ouput and in the solr schema to define how this is processed. I want to use it in order to index something that is common but I can't get any tool to do that with sol: indexing a word and coding at the same position the syntactic and semantic information. I know that in Lucene this is evolving and it will be possible to include metadata but for the moment What does Lucas do with Lucene? Is it putting multiple tokens at the same position or using Payloads? -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- View this message in context: http://www.nabble.com/Solr-and-UIMA-tp24567504p24590509.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
facet.prefix question
i'm trying to do some filtering in the count list retrieved by solr when doing a faceting query , i'm wondering how can i use facet.prefix to gem something like this: Query facet.field=foofacet.prefix=A OR B Response lst name=facet_fields - lst name=foo int name=A12560/int int name=A*5440/int int name=B**2357/int . . . /lst How can i achieve such this behaviour? Best Regards -- Lici
Re: how to get all the docIds in the search result?
: Here id is indeed the uniqueKey of a document. : I want to get all the ids for some other useage. http://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: index backup works only if there are committed index
Another options is making backups more directly, not using the Solr backup mechanism. Check the green link on http://www.manning.com/hatcher3/ Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: solr jay solr...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, July 23, 2009 12:56:23 PM Subject: index backup works only if there are committed index Hi, I noticed that the backup request http://master_host:port/solr/replication?command=backup works only if there are committed index data, i.e. core.getDeletionPolicy().getLatestCommit() is not null. Otherwise, no backup is created. It sounds logical because if nothing has been committed since your last backup, it doesn't help much to do a new backup. However, consider this scenario: 1. a backup process is scheduled at 1:00AM every Monday 2. just before 1:00AM, the system is shutdown (for whatever reason), and then restarts 3. No index is committed before 1:00AM 4. at 1:00AM, backup process starts and no committed index is found, and therefore no backup (until next week) The probability of this scenario is probably small, but it still could happen, and it seems to me that if I want to backup index, a backup should be created whether there are new committed index or not. Your thoughts? Thanks, -- J
Re: Storing string field in solr.ExternalFieldFile type
Thanks for the quick response, Otis. We have been able to achieve the ratio of 2 with different settings, however, considering the huge volume of the data that we need to deal with - 600 GB of data per day, and, we need to keep it in the index for 3 days - we're looking at all possible ways to reduce the index size further. Will definitely keep exploring the straightforward things and see if we can find a better setting. Thanks, -Jibo On Jul 23, 2009, at 9:49 AM, Otis Gospodnetic wrote: I'm not sure if there is a lot of benefit from storing the literal values in that external file vs. directly in the index. There are a number of things one should look at first, as far as performance is concerned - JVM settings, cache sizes, analysis, etc. For example, I have one index here that is 9 times the size of the original data because of how its fields are analyzed. I can change one analysis-level setting and make that ratio go down to 2. So I'd look at other, more straight forward things first. There is a Wiki page either on Solr or Lucene Wiki dedicated to various search performance tricks. Otis -- Sematext is hiring: http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Jibo John jiboj...@mac.com To: solr-user@lucene.apache.org Sent: Thursday, July 23, 2009 12:08:26 PM Subject: Re: Storing string field in solr.ExternalFieldFile type Thanks for the response, Eric. We have seen that size of the index has a direct impact on the search speed, especially when the index size is in GBs, so trying all possible ways to keep the index size as low as we can. We thought solr.ExternalFileField type would help to keep the index size low by storing a text field out side of the index. Here's what we were planning: initially, all the fields except the solr.ExternalFileField type field will be queried and will be displayed to the end user. . There will be subsequent calls from the UI to pull the solr.ExternalFileField field that will be loaded in a lazy manner. However, realized that solr.ExternalFileField only supports float type, however, the data that we're planning to keep as an external field is a string type. Thanks, -Jibo On Jul 22, 2009, at 1:46 PM, Erick Erickson wrote: Hoping the experts chime in if I'm wrong, but As far as I know, while storing a field increases the size of an index, it doesn't have much impact on the search speed. Which you could pretty easily test by creating the index both ways and firing off some timing queries and comparing. Although it would be time consuming... I believe there's some info on the Lucene Wiki about this, but my memory isn't what it used to be. Erick On Tue, Jul 21, 2009 at 2:42 PM, Jibo John wrote: We're in the process of building a log searcher application. In order to reduce the index size to improve the query performance, we're exploring the possibility of having: 1. One field for each log line with 'indexed=true stored=false' that will be used for searching 2. Another field for each log line of type solr.ExternalFileField that will be used only for display purpose. We realized that currently solr.ExternalFileField supports only float type. Is there a way we can override this to support string type? Any issues with this approach? Any ideas are welcome. Thanks, -Jibo
Re: Storing string field in solr.ExternalFieldFile type
Jibo, Well, there is always field compression, which lets you trade the index size/disk space for extra CPU time and thus some increase in indexing and search latency. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Jibo John jiboj...@mac.com To: solr-user@lucene.apache.org Sent: Thursday, July 23, 2009 1:43:45 PM Subject: Re: Storing string field in solr.ExternalFieldFile type Thanks for the quick response, Otis. We have been able to achieve the ratio of 2 with different settings, however, considering the huge volume of the data that we need to deal with - 600 GB of data per day, and, we need to keep it in the index for 3 days - we're looking at all possible ways to reduce the index size further. Will definitely keep exploring the straightforward things and see if we can find a better setting. Thanks, -Jibo On Jul 23, 2009, at 9:49 AM, Otis Gospodnetic wrote: I'm not sure if there is a lot of benefit from storing the literal values in that external file vs. directly in the index. There are a number of things one should look at first, as far as performance is concerned - JVM settings, cache sizes, analysis, etc. For example, I have one index here that is 9 times the size of the original data because of how its fields are analyzed. I can change one analysis-level setting and make that ratio go down to 2. So I'd look at other, more straight forward things first. There is a Wiki page either on Solr or Lucene Wiki dedicated to various search performance tricks. Otis -- Sematext is hiring: http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Jibo John To: solr-user@lucene.apache.org Sent: Thursday, July 23, 2009 12:08:26 PM Subject: Re: Storing string field in solr.ExternalFieldFile type Thanks for the response, Eric. We have seen that size of the index has a direct impact on the search speed, especially when the index size is in GBs, so trying all possible ways to keep the index size as low as we can. We thought solr.ExternalFileField type would help to keep the index size low by storing a text field out side of the index. Here's what we were planning: initially, all the fields except the solr.ExternalFileField type field will be queried and will be displayed to the end user. . There will be subsequent calls from the UI to pull the solr.ExternalFileField field that will be loaded in a lazy manner. However, realized that solr.ExternalFileField only supports float type, however, the data that we're planning to keep as an external field is a string type. Thanks, -Jibo On Jul 22, 2009, at 1:46 PM, Erick Erickson wrote: Hoping the experts chime in if I'm wrong, but As far as I know, while storing a field increases the size of an index, it doesn't have much impact on the search speed. Which you could pretty easily test by creating the index both ways and firing off some timing queries and comparing. Although it would be time consuming... I believe there's some info on the Lucene Wiki about this, but my memory isn't what it used to be. Erick On Tue, Jul 21, 2009 at 2:42 PM, Jibo John wrote: We're in the process of building a log searcher application. In order to reduce the index size to improve the query performance, we're exploring the possibility of having: 1. One field for each log line with 'indexed=true stored=false' that will be used for searching 2. Another field for each log line of type solr.ExternalFileField that will be used only for display purpose. We realized that currently solr.ExternalFileField supports only float type. Is there a way we can override this to support string type? Any issues with this approach? Any ideas are welcome. Thanks, -Jibo
Re: Solr Cell
Found my own answer, use the literal parameter. Should have dug around before asking. Sorry. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com On Jul 23, 2009, at 2:26 PM, Matt Weber wrote: Is it possible to supply addition metadata along with the binary file when using Solr Cell? For example, I have a pdf called somefile.pdf and I have some external metadata related to that file. Such metadata might be things like author, publisher, source, date published, etc. I want to post the binary data for somefile.pdf to Solr Cell AND map my metadata into other fields in the same document that has the extracted text from the pdf. I know I could do this using Tika and SolrJ directly, but it would be much easier if Solr Cell can do it. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com
Re: LocalSolr - order of fields on xml response
Hi Ryan, Thanks for the information. Is this expected to be implemented? Regards, -- Daniel Cassiano _ http://www.apontador.com.br/ http://www.maplink.com.br/ On Wed, Jul 22, 2009 at 10:08 PM, Ryan McKinley ryan...@gmail.com wrote: ya... 'expected', but perhaps not ideal. As is, LocalSolr munges the document on its way out the door to add the distance. When LocalSolr makes it into the source, it will likely use a method like: https://issues.apache.org/jira/browse/SOLR-705 to augment each document with the calculated distance. This will at least have consistent behavior. On Jul 22, 2009, at 10:47 AM, Daniel Cassiano wrote: Hi folks, When I do some query with LocalSolr to get the geo_distance, the order of xml fields is different of a standard query. It's a simple query, like this: http://myhost.com:8088/solr/core/select?qt=geox=-46.01y=-23.01radius=15sort=geo_distanceascq=*:* Is this an expected behavior of LocalSolr? Thanks! -- Daniel Cassiano _ http://www.apontador.com.br/ http://www.maplink.com.br/
JDBC Import not exposing nested entities
Hi, I'm attempting to setup a simple joined index of some tables with the following structure... EMPLOYEEORGANIZATION - employee_id organization_id first_name organization_name last_name edr_party_id organization_id When running the import, I'm getting this WARNING... Jul 23, 2009 2:17:41 PM org.apache.solr.handler.dataimport.SolrWriter upload WARNING: Error creating document : SolrInputDocumnt[{id=id(1.0)={42078}, first_name=first_name(1.0)={Mike}, last_name=last_name(1.0)={Madlock}, edr_party_id=edr _party_id(1.0)={29131}, organization_id=organization_id(1.0)={138}}] org.apache.solr.common.SolrException: Document [42078] missing required field: org at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:289) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:288) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) As a result of this issue, no documents are searchable. If I flip the required flag to false in schema.xml, the WARNING goes away and the documents are searchable. However, the documents do not contain organization_name and they are not searchable by organization_name. Have I overlooked a flag somewhere that specifies that nested entities are indexed? Or an issue in my config? I've attached my full data-config and the fields section of schema.xml. Thanks in advance. Tim schema.xml fields field name=id type=integer indexed=true stored=true required=true / field name=first_name type=string indexed=true stored=true required=false / field name=last_name type=string indexed=true stored=true required=false / field name=edr_party_id type=integer indexed=true stored=true required=false / field name=org type=string indexed=true stored=true required=true / field name=organization_id type=integer indexed=true stored=true required=true / !--field name=city type=string indexed=true stored=true required=false /-- /fields data-config.xml dataConfig dataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@hsrdb3:1521:hsint13 user=user password=password / document name=agentDoc entity name=agent query=SELECT e.employee_id, e.first_name, e.last_name, e.edr_party_id, e.organization_id FROM employee e WHERE e.disabled = 'N' AND rownum lt; 1000 field column=EMPLOYEE_ID name=id / field column=FIRST_NAME name=first_name / field column=LAST_NAME name=last_name / field column=EDR_PARTY_ID name=edr_party_id / field column=ORGANIZATION_ID name=organization_id / entity name=organization query=select o.organization_name from organizations o where o.organization_id = '${agent.ORGANIZATION_ID}' field name=org column=organization_name / /entity /entity /document /dataConfig
RE: Exception searching PhoneticFilterFactory field with number
Sure Otis, and in fact I can narrow it down to just exactly that query, but with user queries I don't think it is right to throw an exception out of phonetic filter factory if the user enters a number. What I am saying is am I going to have to filter the user queries for numerics before using it to search in my double metaphone version of my titles? That doesn't seem good. Jul 23, 2009 2:58:17 PM org.apache.solr.core.SolrCore execute INFO: [10017] webapp=/solr path=/select/ params={debugQuery=truerows=10start=0q=allDoublemetaphone:2343) ^0.5)))} hits=6873 status=500 QTime=3 Jul 23, 2009 2:58:17 PM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: java.lang.IllegalArgumentException: name and value cannot both be empty at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:470) at org.apache.solr.util.SolrPluginUtils.doStandardDebug(SolrPluginUtils.jav a:399) at org.apache.solr.handler.component.DebugComponent.process(DebugComponent. java:54) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:177) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1205) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 86) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 5) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.IllegalArgumentException: name and value cannot both be empty at org.apache.lucene.document.Field.init(Field.java:277) at org.apache.lucene.document.Field.init(Field.java:251) at org.apache.solr.search.QueryParsing.writeFieldVal(QueryParsing.java:307) at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:320) at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:467) ... 19 more -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Monday, July 20, 2009 6:45 PM To: solr-user@lucene.apache.org Subject: Re: Exception searching PhoneticFilterFactory field with number Robert, Can you narrow things down by simplifying the query? For example, I see allDoublemetaphone:2226, which looks suspicious in the give me phonetic version of the input context, but if you could narrow it down, we could probably be able to help more. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org Sent: Monday, July 20, 2009 12:11:38 PM Subject: Exception searching PhoneticFilterFactory field with number Reposting in hopes of an answer... Hello all, I am getting the following exception whenever a user includes a numeric term in their search, and the search includes a field defined with a PhoneticFilterFactory and further it occurs whether I use the DoubleMetaphone encoder or any other. Has this ever come up before? I can replicate this with no data in the index at all, but if I search the field by hand from the solr web interface there is no exception. I am running the lucid imagination 1.3 certified release in a multicore master/slaves configuration. I will include the field def and the search/exception below and let me know if I can include any more clues... seems like it's trying to make a field with no name/value: positionIncrementGap=100 class=solr.WhitespaceTokenizerFactory/ synonyms=index_synonyms.txt ignoreCase=true expand=false/ ignoreCase=true words=stopwords.txt/ protected=protwords.txt/
RE: Exception searching PhoneticFilterFactory field with number
Hey I just noticed that this only happens when I enable debug. If debugQuery=true is on the URL then it goes through the debugging component and that is throwing this exception. It must be getting an empty field object from the phonetic filter factory for numbers or something similar -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Thursday, July 23, 2009 4:12 PM To: solr-user@lucene.apache.org Subject: RE: Exception searching PhoneticFilterFactory field with number Actually my first question should be, Is this a known bug or am I doing something wrong? The only one thing I can find on this topic is the following statement on the solr-dev group when discussing adding the maxCodeLength, see point two below: Ryan McKinley updated SOLR-813: --- Attachment: SOLR-813.patch Here is an update that adresses two concerns: 1. position increments -- this keeps the tokens in sync with the input 2. previous version would stop processing after a number. That is: aaa 1234 bbb would not process bbb 3. Token types... this changes it to DoubleMetaphone rather then ALPHANUM -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Thursday, July 23, 2009 3:24 PM To: solr-user@lucene.apache.org Subject: RE: Exception searching PhoneticFilterFactory field with number Sure Otis, and in fact I can narrow it down to just exactly that query, but with user queries I don't think it is right to throw an exception out of phonetic filter factory if the user enters a number. What I am saying is am I going to have to filter the user queries for numerics before using it to search in my double metaphone version of my titles? That doesn't seem good. Jul 23, 2009 2:58:17 PM org.apache.solr.core.SolrCore execute INFO: [10017] webapp=/solr path=/select/ params={debugQuery=truerows=10start=0q=allDoublemetaphone:2343) ^0.5)))} hits=6873 status=500 QTime=3 Jul 23, 2009 2:58:17 PM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: java.lang.IllegalArgumentException: name and value cannot both be empty at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:470) at org.apache.solr.util.SolrPluginUtils.doStandardDebug(SolrPluginUtils.jav a:399) at org.apache.solr.handler.component.DebugComponent.process(DebugComponent. java:54) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:177) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1205) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 86) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 5) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.IllegalArgumentException: name and value cannot both be empty at org.apache.lucene.document.Field.init(Field.java:277) at org.apache.lucene.document.Field.init(Field.java:251) at org.apache.solr.search.QueryParsing.writeFieldVal(QueryParsing.java:307) at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:320) at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:467) ... 19 more -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Monday, July 20, 2009 6:45 PM To: solr-user@lucene.apache.org Subject: Re: Exception searching PhoneticFilterFactory field with number Robert, Can you narrow things down by simplifying the query? For example, I see allDoublemetaphone:2226, which looks suspicious in the give me phonetic version of the input context, but if you could narrow it down, we could probably be able to help more. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message
RE: Exception searching PhoneticFilterFactory field with number
Actually my first question should be, Is this a known bug or am I doing something wrong? The only one thing I can find on this topic is the following statement on the solr-dev group when discussing adding the maxCodeLength, see point two below: Ryan McKinley updated SOLR-813: --- Attachment: SOLR-813.patch Here is an update that adresses two concerns: 1. position increments -- this keeps the tokens in sync with the input 2. previous version would stop processing after a number. That is: aaa 1234 bbb would not process bbb 3. Token types... this changes it to DoubleMetaphone rather then ALPHANUM -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Thursday, July 23, 2009 3:24 PM To: solr-user@lucene.apache.org Subject: RE: Exception searching PhoneticFilterFactory field with number Sure Otis, and in fact I can narrow it down to just exactly that query, but with user queries I don't think it is right to throw an exception out of phonetic filter factory if the user enters a number. What I am saying is am I going to have to filter the user queries for numerics before using it to search in my double metaphone version of my titles? That doesn't seem good. Jul 23, 2009 2:58:17 PM org.apache.solr.core.SolrCore execute INFO: [10017] webapp=/solr path=/select/ params={debugQuery=truerows=10start=0q=allDoublemetaphone:2343) ^0.5)))} hits=6873 status=500 QTime=3 Jul 23, 2009 2:58:17 PM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: java.lang.IllegalArgumentException: name and value cannot both be empty at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:470) at org.apache.solr.util.SolrPluginUtils.doStandardDebug(SolrPluginUtils.jav a:399) at org.apache.solr.handler.component.DebugComponent.process(DebugComponent. java:54) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:177) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1205) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 86) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 5) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.IllegalArgumentException: name and value cannot both be empty at org.apache.lucene.document.Field.init(Field.java:277) at org.apache.lucene.document.Field.init(Field.java:251) at org.apache.solr.search.QueryParsing.writeFieldVal(QueryParsing.java:307) at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:320) at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:467) ... 19 more -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Monday, July 20, 2009 6:45 PM To: solr-user@lucene.apache.org Subject: Re: Exception searching PhoneticFilterFactory field with number Robert, Can you narrow things down by simplifying the query? For example, I see allDoublemetaphone:2226, which looks suspicious in the give me phonetic version of the input context, but if you could narrow it down, we could probably be able to help more. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org Sent: Monday, July 20, 2009 12:11:38 PM Subject: Exception searching PhoneticFilterFactory field with number Reposting in hopes of an answer... Hello all, I am getting the following exception whenever a user includes a numeric term in their search, and the search includes a field defined with a PhoneticFilterFactory and further it occurs whether I use the DoubleMetaphone encoder or any other. Has this
server won't start using configs from Drupal
I've downloaded solr-2009-07-21.tgz and followed the instructions at http://drupal.org/node/343467 including retrieving the solrconfig.xml and schema.xml files from the Drupal apachesolr module. The server seems to start properly with the original solrconfig.xml and schema.xml files When I try to start up the server with the Drupal supplied files, I get errors on the command line, and a 500 error from the server. solrconfig.xml http://pastebin.com/m23d14a2 schema.xml http://pastebin.com/m2e79f304 output of http://localhost:8983/solr/admin/:http://pastebin.com/m410fa74d Following looks to me like the important bits, but I'm not a java coder, so I could easily be wrong. command line extract: 22/07/2009 5:58:54 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: analyzer without class or tokenizer filter list (plus lots of WARN messages) extract from browser at http://localhost:8983/solr/admin/ org.apache.solr.common.SolrException: Unknown fieldtype 'text' specified on field title (snip lots of stuff) org.apache.solr.common.SolrException: analyzer without class or tokenizer filter list (snip lots of stuff) org.apache.solr.common.SolrException: Error loading class 'solr.CharStreamAwareWhitespaceTokenizerFactory' (snip lots of stuff) Caused by: java.lang.ClassNotFoundException: solr.CharStreamAwareWhitespaceTokenizerFactory Nothing in apache logs... solr logs contain this: 127.0.0.1 - - [22/07/2009:08:01:10 +] GET /solr/admin/ HTTP/1.1 500 10292 Any help greatly appreciated. David.
Re: server won't start using configs from Drupal
I think the problem is CharStreamAwareWhitespaceTokenizerFactory, which used to live in Solr (when Drupal schema.xml for Solr was made), but has since moved to Lucene. I'm half guessing. :) Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: david da...@kenpro.com.au To: solr-user@lucene.apache.org Sent: Thursday, July 23, 2009 9:59:53 PM Subject: server won't start using configs from Drupal I've downloaded solr-2009-07-21.tgz and followed the instructions at http://drupal.org/node/343467 including retrieving the solrconfig.xml and schema.xml files from the Drupal apachesolr module. The server seems to start properly with the original solrconfig.xml and schema.xml files When I try to start up the server with the Drupal supplied files, I get errors on the command line, and a 500 error from the server. solrconfig.xml http://pastebin.com/m23d14a2 schema.xml http://pastebin.com/m2e79f304 output of http://localhost:8983/solr/admin/: http://pastebin.com/m410fa74d Following looks to me like the important bits, but I'm not a java coder, so I could easily be wrong. command line extract: 22/07/2009 5:58:54 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: analyzer without class or tokenizer filter list (plus lots of WARN messages) extract from browser at http://localhost:8983/solr/admin/ org.apache.solr.common.SolrException: Unknown fieldtype 'text' specified on field title (snip lots of stuff) org.apache.solr.common.SolrException: analyzer without class or tokenizer filter list (snip lots of stuff) org.apache.solr.common.SolrException: Error loading class 'solr.CharStreamAwareWhitespaceTokenizerFactory' (snip lots of stuff) Caused by: java.lang.ClassNotFoundException: solr.CharStreamAwareWhitespaceTokenizerFactory Nothing in apache logs... solr logs contain this: 127.0.0.1 - - [22/07/2009:08:01:10 +] GET /solr/admin/ HTTP/1.1 500 10292 Any help greatly appreciated. David.