Number of terms in a SOLR field
Hi all, I am attempting to test some changes I made to my DIH based indexing process. The changes only affect the way I describe my fields in data-config.xml, there should be no changes to the way the data is indexed or stored. As a QA check I was wanting to compare the results from indexing the same data before/after the change. I was looking for a way of getting counts of terms in each field. I guess Luke etc most allow this but how? Regards Fergus.
Re: Number of terms in a SOLR field
Fergus McMenemie wrote: Hi all, I am attempting to test some changes I made to my DIH based indexing process. The changes only affect the way I describe my fields in data-config.xml, there should be no changes to the way the data is indexed or stored. As a QA check I was wanting to compare the results from indexing the same data before/after the change. I was looking for a way of getting counts of terms in each field. I guess Luke etc most allow this but how? Luke uses brute force approach - it traverses all terms, and counts terms per field. This is easy to implement yourself - just get IndexReader.terms() enumeration and traverse it. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Create new core on the fly
On Wed, Sep 30, 2009 at 3:48 AM, djain101 dharmveer_j...@yahoo.com wrote: Hi Shalin, Can you please elaborate, why we need to do unload after create? No you don't need to. You can unload if you want to for some reasons. So, if we do a create, will it modify the solr.xml everytime? Can it be avoided in subsequent requests for create? No, solr.xml will be modified only if persist=true is passed as a request param. I don't understand your second question. Why would you want to issue create commands for the same core multiple times? Also, if we want to implement Load, can you please give some directions to implement load action? I don't know what you want to do. Loading cores without restarting Solr is possible right now by using the create command. -- Regards, Shalin Shekhar Mangar.
delay while adding document to solr index
hi all, I have indexed 10 documents (daily around 5000 documents will be indexed one at a time to solr) at the same time daily few(around 2000) indexed documents (added 30 days back) will be deleted using DeleteByQuery of SolrJ Previously each document used to be indexed within 5ms.. but recently i am facing a delay (sometimes 2min to 10 min) while adding document to index. And my index (folder) size is also increased to 625MB which is very large Previously it was around 230MB My Questions are: 1) is solr not deleting the older documents(added 30 days back) permenently from index event after committing 2)Why the index size is increased 3)reason for delay (2min to 10 mins) while adding the document one at a time to index Help is appreciated Thanks in advance.. -- View this message in context: http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25676777.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Number of terms in a SOLR field
Fergus McMenemie wrote: Hi all, I am attempting to test some changes I made to my DIH based indexing process. The changes only affect the way I describe my fields in data-config.xml, there should be no changes to the way the data is indexed or stored. As a QA check I was wanting to compare the results from indexing the same data before/after the change. I was looking for a way of getting counts of terms in each field. I guess Luke etc most allow this but how? Luke uses brute force approach - it traverses all terms, and counts terms per field. This is easy to implement yourself - just get IndexReader.terms() enumeration and traverse it. Thanks Andrzej This is just a one off QA check. How do I get Luke to display terms and counts? -- Best regards, Andrzej Bialecki Fergus. --
Re: Number of terms in a SOLR field
Fergus McMenemie wrote: Fergus McMenemie wrote: Hi all, I am attempting to test some changes I made to my DIH based indexing process. The changes only affect the way I describe my fields in data-config.xml, there should be no changes to the way the data is indexed or stored. As a QA check I was wanting to compare the results from indexing the same data before/after the change. I was looking for a way of getting counts of terms in each field. I guess Luke etc most allow this but how? Luke uses brute force approach - it traverses all terms, and counts terms per field. This is easy to implement yourself - just get IndexReader.terms() enumeration and traverse it. Thanks Andrzej This is just a one off QA check. How do I get Luke to display terms and counts? 1. get Luke 0.9.9 2. open index with Luke 3. Look at the Overview panel, you will see the list titled Available fields and term counts per field. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Problem getting Solr home from JNDI in Tomcat
hossman wrote: : Hi all, I'm having problems getting Solr to start on Tomcat 6. which version of Solr? Sorry -- a nightly build from about a month ago. Re. your other message, I was sure the two machines had the same version on, but maybe not -- when I'm back in the office tomorrow I'll upgrade them both to a fresh nightly. hossman wrote: : Tomcat is installed in /opt/apache-tomcat , solr is in : /opt/apache-tomcat/webapps/solr , and my Solr home directory is /opt/solr . if solr is in /opt/apache-tomcat/webapps/solr means that you put the solr.war in /opt/apache-tomcat/webapps/ and tomcat expanded it into /opt/apache-tomcat/webapps/solr then that is your problem -- tomcat isn't even looking at your context file (it only looks at the context files to ersolve URLs that it cant resolve looking in the webapps directory) Yes, it's auto-expanded from a war in webapps. I have to admit to being a bit baffled though -- I can't find this rule anywhere in the Tomcat docs, but I'm a beginner really and they're not the clearest :-) hossman wrote: This is why the examples of using context files on the wiki talk about keeping the war *outside* of the webapps directory, and using docBase in your Context declaration... http://wiki.apache.org/solr/SolrTomcat Great, I'll try it this way and see if it clears up. Is it okay to keep the war file *inside* the Solr home directory (/opt/solr in my case) so it's all self-contained? Many thanks, Andrew. -- View this message in context: http://www.nabble.com/Problem-getting-Solr-home-from-JNDI-in-Tomcat-tp25662200p25677750.html Sent from the Solr - User mailing list archive at Nabble.com.
Invalid response with search key having numbers
Hi all I am getting incorrect results when i search with numbers only or string containing numbers. when such a search is done, all the results in the index is returned, irrespective of the search key. For eg, the phone number field is mapped to TextField. it can contains values like , 653-23345 also search string like john25, searched against name will show all the results. my analyser looks like: fieldType name=mytype class=solr.TextField analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory catenateNumbers=1 / /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory catenateNumbers=1 / /analyzer /fieldType anything wrong in the analyser? Do i need to use any other filters instead of catenateAll. Thanks C -- View this message in context: http://www.nabble.com/Invalid-response-with-search-key-having-numbers-tp25677793p25677793.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: search for non empty field
Hi, i'm not having the expected results when using [* TO *], the results are including empty fields. Here is my configuration: schema.xml: field name=refFaseExp type=string indexed=true stored=true multiValued=true/ bean: @Field private ListString refFaseExp= new ArrayListString(); query: http://host.com/select?rows=0facet=truefacet.field=refFaseExpq=*:* AND refFaseExp:[* TO *] query results: (...) lst name=facet_counts lst name=facet_queries/ - lst name=facet_fields - lst name=refFaseExp int name=32/int (...) I tried changing type=string to long and nothing changed. When I use -refFaseExp:[* TO *], results 0 documents. Any idea? Thx in advance. On Mon, Mar 31, 2008 at 2:07 PM, Matt Mitchell goodie...@gmail.com wrote: Thanks Erik. I think this is the thread here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200709.mbox/%3c67117a73-2208-401f-ab5d-148634c77...@variogr.am%3e Matt On Sun, Mar 30, 2008 at 9:50 PM, Erik Hatcher e...@ehatchersolutions.com wrote: Documents with a particular field can be matched using: field:[* TO *] Or documents without a particular field with: -field:[* TO *] An empty field? Meaning one that was indexed but with no terms? I'm not sure about that one. Seems like Hoss replied to something similar on this last week or so though - check the archives. Erik On Mar 30, 2008, at 9:43 PM, Matt Mitchell wrote: I'm looking for the exact same thing. On Sun, Mar 30, 2008 at 8:45 PM, Ismail Siddiqui ism...@gmail.com wrote: Hi all, I have a situation where i have to filter result on a non empty field . wild card wont work as it will have to match with a letter. How can I form query to return result where a particular field is non-empty . Ismail
Re: delay while adding document to solr index
Swapna, Your answers are inline. 2009/9/30 swapna_here swapna.here...@gmail.com: hi all, I have indexed 10 documents (daily around 5000 documents will be indexed one at a time to solr) at the same time daily few(around 2000) indexed documents (added 30 days back) will be deleted using DeleteByQuery of SolrJ Previously each document used to be indexed within 5ms.. but recently i am facing a delay (sometimes 2min to 10 min) while adding document to index. And my index (folder) size is also increased to 625MB which is very large Previously it was around 230MB My Questions are: 1) is solr not deleting the older documents(added 30 days back) permenently from index event after committing Have you run optimize? 2)Why the index size is increased If 5000 docs are added daily and only 2000 deleted, the index size would increase because of the remaining 3000 documents. 3)reason for delay (2min to 10 mins) while adding the document one at a time to index I don't know why this would happen. Is your disk nearly full? Which OS are you running on? What is the configuration of Solr? Help is appreciated Thanks in advance.. -- View this message in context: http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25676777.html Sent from the Solr - User mailing list archive at Nabble.com. Hope this helps Pravin
Solr Porting to .Net
Hi All I'm wondering if is already available a Solr version for .Net or if it is still under development/planning. I've searched on Solr website but I've found only info on Lucene .Net project. Best Regards Antonio -- Antonio Calò -- Software Developer Engineer @ Intellisemantic Mail anton.c...@gmail.com Tel. 011-56.90.429 --
Re: Solr Porting to .Net
You may want to check out - http://code.google.com/p/solrnet/ 2009/9/30 Antonio Calò anton.c...@gmail.com: Hi All I'm wondering if is already available a Solr version for .Net or if it is still under development/planning. I've searched on Solr website but I've found only info on Lucene .Net project. Best Regards Antonio -- Antonio Calò -- Software Developer Engineer @ Intellisemantic Mail anton.c...@gmail.com Tel. 011-56.90.429 --
Re: delay while adding document to solr index
Also, what is your merge factor set to? Pravin 2009/9/30 Pravin Paratey prav...@gmail.com: Swapna, Your answers are inline. 2009/9/30 swapna_here swapna.here...@gmail.com: hi all, I have indexed 10 documents (daily around 5000 documents will be indexed one at a time to solr) at the same time daily few(around 2000) indexed documents (added 30 days back) will be deleted using DeleteByQuery of SolrJ Previously each document used to be indexed within 5ms.. but recently i am facing a delay (sometimes 2min to 10 min) while adding document to index. And my index (folder) size is also increased to 625MB which is very large Previously it was around 230MB My Questions are: 1) is solr not deleting the older documents(added 30 days back) permenently from index event after committing Have you run optimize? 2)Why the index size is increased If 5000 docs are added daily and only 2000 deleted, the index size would increase because of the remaining 3000 documents. 3)reason for delay (2min to 10 mins) while adding the document one at a time to index I don't know why this would happen. Is your disk nearly full? Which OS are you running on? What is the configuration of Solr? Help is appreciated Thanks in advance.. -- View this message in context: http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25676777.html Sent from the Solr - User mailing list archive at Nabble.com. Hope this helps Pravin
Re: delay while adding document to solr index
thanks for your reply i have not optimized at all my knowledge is optimize improves the query performance but it will take more disk space except that i have no idea how to use it previously for 10 documents the size occupied was around 250MB But after 2 months it is 625MB why this happened ? is it because i have not optimized the index can any body tell me when and how to optimize the index(with configuration details) . -- View this message in context: http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25678531.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: delay while adding document to solr index
Swapna While the disk space does increase during the process of optimization, it should almost always return to the original size or slightly less. This is a silly question. But off the top of my head, I can't think of any other reason why the index size would increase - Are you running a commit/ after adding documents? If you are, you might want to compare the size of each document being currently indexed with the ones you indexed a few months back. To optimize the index, simply post optimize/ to Solr. Or read [http://wiki.apache.org/solr/SolrOperationsTools] Pravin 2009/9/30 swapna_here swapna.here...@gmail.com: thanks for your reply i have not optimized at all my knowledge is optimize improves the query performance but it will take more disk space except that i have no idea how to use it previously for 10 documents the size occupied was around 250MB But after 2 months it is 625MB why this happened ? is it because i have not optimized the index can any body tell me when and how to optimize the index(with configuration details) . -- View this message in context: http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25678531.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ${dataimporter.last_index_time} as an argument to newerThan in FileListEntityProcessor?
On Tue, Sep 29, 2009 at 11:43 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Sep 29, 2009 at 8:14 PM, Bill Dueber b...@dueber.com wrote: Is this possible? I can't figure out a syntax that works, and all the examples show using last_index_time as an argument to an SQL query. It is possible but it doesn't work right now. I've created an issue and I will give a patch shortly. https://issues.apache.org/jira/browse/SOLR-1473 Bill, this fix is now available in trunk. A sample usage would look like the following: dataConfig document entity name=x processor=FileListEntityProcessor fileName=.* newerThan=${dih.last_index_time} baseDir=/data transformer=TemplateTransformer field column=id template=${x.file} / /entity /document /dataConfig Thanks for reporting this! -- Regards, Shalin Shekhar Mangar.
Re: delay while adding document to solr index
thanks again for your immediate response yes, i am running the commit after a document is indexed here i don't understand why my index size is increased to 625MB(for the 10 documents) which was previously 250MB is this due to i have not optimized at all my index or since i am adding documents individually i need solution for this urgently thanks a lot -- View this message in context: http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25679463.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: search for non empty field
field:[* TO *] matches documents that have that have one or more terms in that field. If your indexer is sending a value, it'll end up with a term. Note that changing from string to long requires reindexing, though that isn't the issue here. Erik On Sep 30, 2009, at 2:39 AM, Jorge Agudo Praena wrote: Hi, i'm not having the expected results when using [* TO *], the results are including empty fields. Here is my configuration: schema.xml: field name=refFaseExp type=string indexed=true stored=true multiValued=true/ bean: @Field private ListString refFaseExp= new ArrayListString(); query: http://host.com/select? rows=0facet=truefacet.field=refFaseExpq=*:* AND refFaseExp:[* TO *] query results: (...) lst name=facet_counts lst name=facet_queries/ - lst name=facet_fields - lst name=refFaseExp int name=32/int (...) I tried changing type=string to long and nothing changed. When I use -refFaseExp:[* TO *], results 0 documents. Any idea? Thx in advance. On Mon, Mar 31, 2008 at 2:07 PM, Matt Mitchell goodie...@gmail.com wrote: Thanks Erik. I think this is the thread here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200709.mbox/%3c67117a73-2208-401f-ab5d-148634c77...@variogr.am%3e Matt On Sun, Mar 30, 2008 at 9:50 PM, Erik Hatcher e...@ehatchersolutions.com wrote: Documents with a particular field can be matched using: field:[* TO *] Or documents without a particular field with: -field:[* TO *] An empty field? Meaning one that was indexed but with no terms? I'm not sure about that one. Seems like Hoss replied to something similar on this last week or so though - check the archives. Erik On Mar 30, 2008, at 9:43 PM, Matt Mitchell wrote: I'm looking for the exact same thing. On Sun, Mar 30, 2008 at 8:45 PM, Ismail Siddiqui ism...@gmail.com wrote: Hi all, I have a situation where i have to filter result on a non empty field . wild card wont work as it will have to match with a letter. How can I form query to return result where a particular field is non-empty . Ismail
init parameters for queryParser
Hi all, I've got my own query parser plugin defined thanks to the queryParser tag: queryParser name=myqueryparser class=my.package.MyQueryParserPlugin / The QParserPlugin class has got an init method like this: public void init(NamedList args); Where and how do I put my args to be passed to init for my query parser plugin? I'm trying queryParser name=myqueryparser class=my.package.MyQueryParserPlugin lst name=defaults str name=param1value1/str str name=param1value1/str /lst /queryParser But I'm not sure if it's the right way. Could we also update the wiki about this? http://wiki.apache.org/solr/SolrPlugins#QParserPlugin Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: Solr Porting to .Net
SolrNet is only a http client to Solr. I've been experimenting with IKVM but wasn't very successful... There seem to be some issues with class loading, but unfortunately I don't have much time to continue these experiments right now. In case you're interested in continuing this, here's the repository: http://code.google.com/p/mausch/source/browse/trunk/SolrIKVM Also recently someone registered a project on google code with the same intentions, but no commits yet: http://code.google.com/p/solrwin/ http://code.google.com/p/mausch/source/browse/trunk/SolrIKVMCheers, Mauricio On Wed, Sep 30, 2009 at 7:09 AM, Pravin Paratey prav...@gmail.com wrote: You may want to check out - http://code.google.com/p/solrnet/ 2009/9/30 Antonio Calò anton.c...@gmail.com: Hi All I'm wondering if is already available a Solr version for .Net or if it is still under development/planning. I've searched on Solr website but I've found only info on Lucene .Net project. Best Regards Antonio -- Antonio Calò -- Software Developer Engineer @ Intellisemantic Mail anton.c...@gmail.com Tel. 011-56.90.429 --
Re: delay while adding document to solr index
Hi, - Try to let solr do the commits for you (setting up autocommit feature). (and stop committing after inserting one document). This should greatly improve the delays you're experiencing. - If you do not optimize, it's normal your index size only grows. Optimize once regularly when your load is minimal. Jerome. 2009/9/30 swapna_here swapna.here...@gmail.com: thanks again for your immediate response yes, i am running the commit after a document is indexed here i don't understand why my index size is increased to 625MB(for the 10 documents) which was previously 250MB is this due to i have not optimized at all my index or since i am adding documents individually i need solution for this urgently thanks a lot -- View this message in context: http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25679463.html Sent from the Solr - User mailing list archive at Nabble.com. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: init parameters for queryParser
On Wed, Sep 30, 2009 at 7:14 PM, Jérôme Etévé jerome.et...@gmail.comwrote: Hi all, I've got my own query parser plugin defined thanks to the queryParser tag: queryParser name=myqueryparser class=my.package.MyQueryParserPlugin / The QParserPlugin class has got an init method like this: public void init(NamedList args); Where and how do I put my args to be passed to init for my query parser plugin? I'm trying queryParser name=myqueryparser class=my.package.MyQueryParserPlugin lst name=defaults str name=param1value1/str str name=param1value1/str /lst /queryParser But I'm not sure if it's the right way. You don't need to put lst name=defaults - defaults, appends, invariants are keys used by RequestHandlers. Just put all the params you need directly: queryParser name=myqueryparser class=my.package.MyQueryParserPlugin str name=param1value1/str bool name=param2true/bool /queryParser -- Regards, Shalin Shekhar Mangar.
Where do I need to install Solr
Does Solr have to be installed on the web server, or can I install Solr on a different server and access it from my web server? Kevin Miller Web Services
Re: Where do I need to install Solr
Kevin Miller wrote: Does Solr have to be installed on the web server, or can I install Solr on a different server and access it from my web server? Kevin Miller Web Services you can access it from your webserver (or browser) via HTTP/XML requests and responses. have a look at solr tutorial: http://lucene.apache.org/solr/tutorial.html and this one: http://www.xml.com/lpt/a/1668 -- Claudio Martella Digital Technologies Unit Research Development - Engineer TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.
Re: Solr Porting to .Net
I guys, thanks for your prompt feedback. So, you are saying that SolrNet is just a wrapper written in C#, that connnect the Solr (still written in Java that run on the IKVM) ? Is my understanding correct? Regards Antonio 2009/9/30 Mauricio Scheffer mauricioschef...@gmail.com SolrNet is only a http client to Solr. I've been experimenting with IKVM but wasn't very successful... There seem to be some issues with class loading, but unfortunately I don't have much time to continue these experiments right now. In case you're interested in continuing this, here's the repository: http://code.google.com/p/mausch/source/browse/trunk/SolrIKVM Also recently someone registered a project on google code with the same intentions, but no commits yet: http://code.google.com/p/solrwin/ http://code.google.com/p/mausch/source/browse/trunk/SolrIKVMCheers, Mauricio On Wed, Sep 30, 2009 at 7:09 AM, Pravin Paratey prav...@gmail.com wrote: You may want to check out - http://code.google.com/p/solrnet/ 2009/9/30 Antonio Calò anton.c...@gmail.com: Hi All I'm wondering if is already available a Solr version for .Net or if it is still under development/planning. I've searched on Solr website but I've found only info on Lucene .Net project. Best Regards Antonio -- Antonio Calò -- Software Developer Engineer @ Intellisemantic Mail anton.c...@gmail.com Tel. 011-56.90.429 -- -- Antonio Calò -- Software Developer Engineer @ Intellisemantic Mail anton.c...@gmail.com Tel. 011-56.90.429 --
Re: Where do I need to install Solr
Solr is a separate service, in the same way a RDMS is a separate service. Whether you install it on the same machine as your webserver or not, it's logically separated from your server. Jerome. 2009/9/30 Claudio Martella claudio.marte...@tis.bz.it: Kevin Miller wrote: Does Solr have to be installed on the web server, or can I install Solr on a different server and access it from my web server? Kevin Miller Web Services you can access it from your webserver (or browser) via HTTP/XML requests and responses. have a look at solr tutorial: http://lucene.apache.org/solr/tutorial.html and this one: http://www.xml.com/lpt/a/1668 -- Claudio Martella Digital Technologies Unit Research Development - Engineer TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: Solr Porting to .Net
Solr is a server that runs on Java and it exposes a http interface.SolrNet is a client library for .Net that connects to a Solr instance via its http interface. My experiment (let's call it SolrIKVM) is an attempt to run Solr on .Net. Hope that clear things up. On Wed, Sep 30, 2009 at 11:50 AM, Antonio Calò anton.c...@gmail.com wrote: I guys, thanks for your prompt feedback. So, you are saying that SolrNet is just a wrapper written in C#, that connnect the Solr (still written in Java that run on the IKVM) ? Is my understanding correct? Regards Antonio 2009/9/30 Mauricio Scheffer mauricioschef...@gmail.com SolrNet is only a http client to Solr. I've been experimenting with IKVM but wasn't very successful... There seem to be some issues with class loading, but unfortunately I don't have much time to continue these experiments right now. In case you're interested in continuing this, here's the repository: http://code.google.com/p/mausch/source/browse/trunk/SolrIKVM Also recently someone registered a project on google code with the same intentions, but no commits yet: http://code.google.com/p/solrwin/ http://code.google.com/p/mausch/source/browse/trunk/SolrIKVMCheers, Mauricio On Wed, Sep 30, 2009 at 7:09 AM, Pravin Paratey prav...@gmail.com wrote: You may want to check out - http://code.google.com/p/solrnet/ 2009/9/30 Antonio Calò anton.c...@gmail.com: Hi All I'm wondering if is already available a Solr version for .Net or if it is still under development/planning. I've searched on Solr website but I've found only info on Lucene .Net project. Best Regards Antonio -- Antonio Calò -- Software Developer Engineer @ Intellisemantic Mail anton.c...@gmail.com Tel. 011-56.90.429 -- -- Antonio Calò -- Software Developer Engineer @ Intellisemantic Mail anton.c...@gmail.com Tel. 011-56.90.429 --
Questions about synonyms and highlighting
Hi, Can you please give me some answers for those questions : 1 - How can I get synonyms found for a keyword ? I mean i search foo and i have in my synonyms.txt file the following tokens : foo, foobar, fee (with expand = true) My index contains foo and foobar. I want to display a message in a result page, on the header for example, only the 2 matched tokens and not fee like Results found for foo and foobar 2 - Can solR make analysis on an index to extract associations between tokens ? for example , if foo often appears with fee in a field, it will associate the 2 tokens. 3 - Is it possible and if so How can I configure solR to set or not highlighting for tokens with diacritics ? Settings for vélo (all highlighted) == the two words emvélo/em and emvelo/em are highlighted Settings for vélo == the first word emvélo/em is highlighted but not the second : velo 4 - the same question for highlighting with lemmatisation? Settings for manage (all highlighted) == the two wordsemmanage/em and emmanagement/em are highlighted Settings for manage == the first word emmanage/em is highlighted but not the second : management Thanks in advance. Regards Nourredine. __ Do You Yahoo!? En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités http://mail.yahoo.fr Yahoo! Mail
Multi-valued field cache
I want to build a FunctionQuery that scores documents based on a multi-valued field. My intention was to use the field cache, but that doesn't get me multiple values per document. I saw other posts suggesting UnInvertedField as the solution. I don't see a method in the UnInvertedField class that will give me a list of field values per document. I only see methods that give values per document set. Should I use one of those methods and create document sets of size 1 for each document? Thanks, Wojtek -- View this message in context: http://www.nabble.com/Multi-valued-field-cache-tp25684952p25684952.html Sent from the Solr - User mailing list archive at Nabble.com.
Adding data from nutch to a Solr index
Alright, first post to this list and I hope the question is not too stupid or misplaced ... what I currently have: - a nicely working Solr 1.3 index with information about some entities e.g. organisations, indexed from an RDBMS. Many of these entities have an URL pointing at further information, e.g. the website of an institute or company. - an installation of nutch 0.9 with which I can crawl for the URLs that I can extract from the RDBMS mentioned above and put into a seed file - tutorials about how to put crawled and indexed data from nutch 1.0 (which I could install w/o problems) into a separate Solr index what I want: - combine the indexed information from the RDBMS and the website in one Solr index so that I can search both in one and with the capability of using all the Solr features. E.g. having the following (example) fields in one document: doc name-from-RDBMS indexed-content-from-RDBMS indexed-content-from-website URL ... /doc Any input appreciated! Cheers, Sönke
NGramTokenFilter behaviour
If I index the following text: I live in Dublin Ireland where Guinness is brewed Then search for: duvlin Should Solr return a match? In the admin interface under the analysis section, Solr highlights some NGram matches? When I enter the following query string into my browser address bar, I get 0 results? http://localhost:8983/solr/select/?q=duvlindebugQuery=true Nor do I get results for dub, dubli, ublin, dublin (du does return a result). I also notice when I use debugQuery=true, the parsed query is a PhraseQuery. This doesn't make sense to me, as surely the point of the NGram is to use a Boolean OR between each Gram?? However, if I don't use an NGramFilterFactory at query time, I can get results for: dub, ublin, du, but not duvlin. fieldType name=text class=solr.TextField analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.NGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer /fieldType Can someone please clarify what the purpose of the NGramFilter/tokenizer is, if not to allow for misspellings/morphological variation and also, what the correct configuration is in terms of use at index/query time. Any help appreciated! Aodh. Solr 1.3, JDK 1.6
Re: Adding data from nutch to a Solr index
Sönke Goldbeck wrote: Alright, first post to this list and I hope the question is not too stupid or misplaced ... what I currently have: - a nicely working Solr 1.3 index with information about some entities e.g. organisations, indexed from an RDBMS. Many of these entities have an URL pointing at further information, e.g. the website of an institute or company. - an installation of nutch 0.9 with which I can crawl for the URLs that I can extract from the RDBMS mentioned above and put into a seed file - tutorials about how to put crawled and indexed data from nutch 1.0 (which I could install w/o problems) into a separate Solr index what I want: - combine the indexed information from the RDBMS and the website in one Solr index so that I can search both in one and with the capability of using all the Solr features. E.g. having the following (example) fields in one document: doc name-from-RDBMS indexed-content-from-RDBMS indexed-content-from-website URL ... /doc I believe that this kind of document merging is not possible (at least not easily) - you have to assemble the whole document before you index it in Solr. If these documents use the same primary key (I guess they do, otherwise how would you merge them...) then you can do the merging in your front-end application, which would have to submit the main query to Solr, and then for each Solr document on the list of results it would retrieve a Nutch document (using NutchBean API). (The not so easy way involves writing a SearchComponent that does the latter part of that process on the Solr side.) -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
RE: NGramTokenFilter behaviour
My understanding of a NGramTokenizing is to help with languages that don't necessarily contain spaces as a word delimiter (Japanese et al). In that case bi-gramming is used to find words contained within a stream of unbroken characters. In that case, you want to find all of the bi-grams that you input for the search query. An OR wouldn't work as well, as you would find tons of hits. -Todd Feak -Original Message- From: aod...@gmail.com [mailto:aod...@gmail.com] Sent: Wednesday, September 30, 2009 10:54 AM To: solr-user@lucene.apache.org Subject: NGramTokenFilter behaviour If I index the following text: I live in Dublin Ireland where Guinness is brewed Then search for: duvlin Should Solr return a match? In the admin interface under the analysis section, Solr highlights some NGram matches? When I enter the following query string into my browser address bar, I get 0 results? http://localhost:8983/solr/select/?q=duvlindebugQuery=true Nor do I get results for dub, dubli, ublin, dublin (du does return a result). I also notice when I use debugQuery=true, the parsed query is a PhraseQuery. This doesn't make sense to me, as surely the point of the NGram is to use a Boolean OR between each Gram?? However, if I don't use an NGramFilterFactory at query time, I can get results for: dub, ublin, du, but not duvlin. fieldType name=text class=solr.TextField analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.NGramFilterFactory minGramSize=2 maxGramSize=15/ /analyzer /fieldType Can someone please clarify what the purpose of the NGramFilter/tokenizer is, if not to allow for misspellings/morphological variation and also, what the correct configuration is in terms of use at index/query time. Any help appreciated! Aodh. Solr 1.3, JDK 1.6
Re: n-Gram, only works with queries of 2 letters
Has this issue been fixed yet? can anyone shed some light on what's going on here please. NGramming is critical to my app. I will have to look to something other than Solr if it's not possible to do :(
Re: Number of terms in a SOLR field
Fergus McMenemie wrote: Fergus McMenemie wrote: Hi all, I am attempting to test some changes I made to my DIH based indexing process. The changes only affect the way I describe my fields in data-config.xml, there should be no changes to the way the data is indexed or stored. As a QA check I was wanting to compare the results from indexing the same data before/after the change. I was looking for a way of getting counts of terms in each field. I guess Luke etc most allow this but how? Luke uses brute force approach - it traverses all terms, and counts terms per field. This is easy to implement yourself - just get IndexReader.terms() enumeration and traverse it. Thanks Andrzej This is just a one off QA check. How do I get Luke to display terms and counts? 1. get Luke 0.9.9 2. open index with Luke 3. Look at the Overview panel, you will see the list titled Available fields and term counts per field. Thanks, That got me going, and I felt a little stupid after stumbling across http://wiki.apache.org/solr/LukeRequestHandler Regards Fergus
Re: NGramTokenFilter behaviour
On Wed, Sep 30, 2009 at 11:24 PM, aod...@gmail.com wrote: Can someone please clarify what the purpose of the NGramFilter/tokenizer is, if not to allow for misspellings/morphological variation and also, what the correct configuration is in terms of use at index/query time. If it is spellcheck you are interested in, take a look at http://wiki.apache.org/solr/SpellCheckComponent -- Regards, Shalin Shekhar Mangar.
Re: NGramTokenFilter behaviour
On Wed, Sep 30, 2009 at 11:24 PM, aod...@gmail.com wrote: If I index the following text: I live in Dublin Ireland where Guinness is brewed Then search for: duvlin Should Solr return a match? In the admin interface under the analysis section, Solr highlights some NGram matches? When I enter the following query string into my browser address bar, I get 0 results? http://localhost:8983/solr/select/?q=duvlindebugQuery=true Nor do I get results for dub, dubli, ublin, dublin (du does return a result). I also notice when I use debugQuery=true, the parsed query is a PhraseQuery. This doesn't make sense to me, as surely the point of the NGram is to use a Boolean OR between each Gram?? However, if I don't use an NGramFilterFactory at query time, I can get results for: dub, ublin, du, but not duvlin. Is the n-grammed field specified as the defaultSearchField in your schema.xml? If not, then you will have to specify the field name during querying e.g. field_name:duvlin. You can see exactly how your query is being parsed if you add debugQuery=on as a request parameter. -- Regards, Shalin Shekhar Mangar.
Conditional deduplication
If I index a bunch of email documents, is there a way to sayshow me all email documents, but only one per To: email address so that if there are a total of 10 distinct To: fields in the corpus, I get back 10 email documents? I'm aware of http://wiki.apache.org/solr/Deduplication but I want to retain the ability to search across all of my email documents most of the time, and only occasionally search for the distinct ones. Essentially I want to do a SELECT DISTINCT to_field FROM documents where a normal search is a SELECT * FROM documents Thanks for any pointers.
Re: Conditional deduplication
See http://wiki.apache.org/solr/FieldCollapsing On Wed, Sep 30, 2009 at 4:41 PM, Michael solrco...@gmail.com wrote: If I index a bunch of email documents, is there a way to sayshow me all email documents, but only one per To: email address so that if there are a total of 10 distinct To: fields in the corpus, I get back 10 email documents? I'm aware of http://wiki.apache.org/solr/Deduplication but I want to retain the ability to search across all of my email documents most of the time, and only occasionally search for the distinct ones. Essentially I want to do a SELECT DISTINCT to_field FROM documents where a normal search is a SELECT * FROM documents Thanks for any pointers.
field collapsing sums
hello all, i have a question on the field collapsing patch, say i have an integer field called num_in_stock and i collapse by some other column, is it possible to sum up that integer field and return the total in the output, if not how would i go about extending the collapsing component to support that? thx much --joe
mergefactor=1 questions
In order to make maximal use of our storage by avoiding the dead 2x overhead needed to optimize the index we are considering setting mergefactor=1 and living with the slow indexing performance which is not a problem in our use case. Some questions: 1) Does mergefactor=1 mean that the size of the index on disk increases only due to add/s or is there some sort of merging that happens that temporarily inflates disk usage? 2) It was mentioned that, with per-segment readers, an optimized index may not be the best option. What are per-segment readers? Is this configurable or some sort of default? What are the cases where an optimized index (one segment) might not be the best option? Thanks! Phil
Seeking Solr/Nutch consultant in San Jose, CA
Hi, I am working with a SaaS vendor who is integrated with Nutch 0.9 and SOLR. We are looking for some help to migrate this to Nutch 1.0. The work involves: 1) We made changes to Nutch 0.9; these need to be ported to Nutch 1.0. 2) Configure SOLR integration with Nutch 1.0 3) Configure SOLR to do Japanese indexing; expose this configuration as part of Baynote configuration. 4) Check if indexes are portable between Nutch 0.9 and Nutch 1.0 - should we re-index? Please email me if there is interest. The work is in San Jose, CA. Duration and rate are not yet known. Best regards, Leann Leann Pereira | o: +1 650.425.7950 | le...@1sourcestaffing.com | Senior Technical Recruiter
Re: Seattle / PNW Hadoop/Lucene/HBase Meetup, Wed Sep 30th
As Bradford is out of town this evening, I will take up the mantel of Person-on-Point. Contact me with questions re: tonight's gathering. See you tonight! -Nick 614.657.0267 On Mon, Sep 28, 2009 at 4:33 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: Hello everyone! Don't forget that the Meetup is THIS Wednesday! I'm looking forward to hearing about Hive from the Facebook team ... and there might be a few other interesting talks as well. Here's the details in the wiki: http://wiki.apache.org/hadoop/PNW_Hadoop_%2B_Apache_Cloud_Stack_User_Group Cheers, Bradford On Mon, Sep 14, 2009 at 11:35 AM, Bradford Stephens bradfordsteph...@gmail.com wrote: Greetings, It's time for another Hadoop/Lucene/ApacheCloud Stack meetup! This month it'll be on Wednesday, the 30th, at 6:45 pm. We should have a few interesting guests this time around -- someone from Facebook may be stopping by to talk about Hive :) We've had great attendance in the past few months, let's keep it up! I'm always amazed by the things I learn from everyone. We're back at the University of Washington, Allen Computer Science Center (not Computer Engineering) Map: http://www.washington.edu/home/maps/?CSE Room: 303 -or- the Entry level. If there are changes, signs will be posted. More Info: The meetup is about 2 hours (and there's usually food): we'll have two in-depth talks of 15-20 minutes each, and then several lightning talks of 5 minutes. If no one offers, We'll then have discussion and 'social time'. we'll just have general discussion. Let net know if you're interested in speaking or attending. We'd like to focus on education, so every presentation *needs* to ask some questions at the end. We can talk about these after the presentations, and I'll record what we've learned in a wiki and share that with the rest of us. Contact: Bradford Stephens, 904-415-3009, bradfordsteph...@gmail.com Cheers, Bradford -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Re: Writing optimized index to different storage?
Sorry, I should have given more background. We have, at the moment 3.8 million documents of 0.7MB/doc average so we have extremely large shards. We build about 400,000 documents to a shard resulting 200GB/shard. We are also using LVM snapshots to manage a snapshot of the shard which we serve while we continue to build. In order to optimize the building shard of around 200GB we need 400GB of disk space to allow for 2x size increase. Due to the nature of snapshotting, the volume containing the snapshot has to be as large as the build volume, i.e. 400GB. If we could write the optimized build shard elsewhere instead of in place we could avoid the need for the serving volume to match the size of the building volume. We'd like to avoid the need to have 200GB+ hanging around just to optimize. Responses we got on whether writing elsewhere optimize make it clear that's not a solution. I posted another question to the list just a bit ago asking whether mergefactor=1 would give us a single segment index that is always optimized so that we don't have the 2x overhead. However, running a build with merge factor=1 shows that lots of segments get created/merged and that the index grows in size but shrinks at intervals to a degree too. It is not clear how big the index is at any point in time. Chris Hostetter wrote: : Is it possible to tell Solr or Lucene, when optimizing, to write the files : that constitute the optimized index to somewhere other than : SOLR_HOME/data/index or is there something about the optimize that requires : the final segment to be created in SOLR_HOME/data/index? For what purpose? http://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Webinar: Apache Solr 1.4 – Faster, Easier, an d More Versatile than Ever
Excuse the cross-posting and gratuitous marketing :) Erik My company, Lucid Imagination, is sponsoring a free and in-depth technical webinar with Erik Hatcher, one of our co-founders as Lucid Imagination, as well as co-author of Lucene in Action, and Lucene/Solr PMC member and committer. Sign up here: http://www.eventsvc.com/lucidimagination/100909?trk=WR-OCT2009-AP Friday, October 9th 2009 10:00AM – 11:00AM PDT / 1:00 – 2:00PM EDT If you’ve got a lot of data to tame in a variety of formats, there’s no better, deeper, faster platform to build your search application with than Solr. Apache Solr 1.4 expands the power and versatility of the leading open source search server, with its convenient web- services interfaces and well-packaged server implementation. Erik will present and discuss key features and innovations of Solr 1.4, covering, among others: * Faster, more streamlined document and query processing * New powerful search methods including multi-select faceting, deduplication and numeric range handling * Simplified, powerful, highly-scalable deployment improvements with new Java server infrastructure Sign up for the free webinar at http://www.eventsvc.com/lucidimagination/100909?trk=WR-OCT2009-AP About the presenter: Erik Hatcher, is the co-author of “Lucene in Action” as well as co- author of “Java Development with Ant”. Erik has been an active member of the Lucene community – a leading Lucene and Solr committer, member of the Lucene Project Management Committee, member of the Apache Software Foundation as well as a frequent invited speaker at various industry events.
changing dismax parser to not treat symbols differently
how would i go about modifying the dismax parser to treat +/- as regular text?
Re: changing dismax parser to not treat symbols differently
Joe Calderon wrote: how would i go about modifying the dismax parser to treat +/- as regular text? Would be nice if there was a tiny simple method you could override for this, but: You should extend the dismax parser and override addMainQuery Where it calls SolrPluginUtils.partialEscape, call your own escape method that does what that one does, but also escapes + and -. I think that should work alright. -- - Mark http://www.lucidimagination.com
Re: field collapsing sums
Hi, At the moment I think the most appropriate place to put it is in the AbstractDocumentCollapser (in the getCollapseInfo method). Though, it might not be the most efficient. Cheers, Uri Joe Calderon wrote: hello all, i have a question on the field collapsing patch, say i have an integer field called num_in_stock and i collapse by some other column, is it possible to sum up that integer field and return the total in the output, if not how would i go about extending the collapsing component to support that? thx much --joe
Re: Create new core on the fly
So, if we do a create, will it modify the solr.xml everytime? Can it be avoided in subsequent requests for create? No, solr.xml will be modified only if persist=true is passed as a request param. I don't understand your second question. Why would you want to issue create commands for the same core multiple times? Shalin, persist=true does not work with the create action. I am creating the core using the below url and everytime it is modifying solr.xml http://localhost:8080/app/solr/admin/cores?action=CREATEname=core1instanceDir=core1persist=false Our requirement is that, we have multiple solr app servers behind a load balancer. So if we hit a url to create solr core, it will hit any one of the app server and core will be loaded on to that app server only. Rest all other app servers will not be aware about the new solr core and all the search requests will fail if it hit the other app servers on which core is not loaded. That's the reason we need to CREATE cores on each of the app server but we don't necessarily want solr.xml to be modified. Thanks, Dharmveer Shalin Shekhar Mangar wrote: On Wed, Sep 30, 2009 at 3:48 AM, djain101 dharmveer_j...@yahoo.com wrote: Hi Shalin, Can you please elaborate, why we need to do unload after create? No you don't need to. You can unload if you want to for some reasons. So, if we do a create, will it modify the solr.xml everytime? Can it be avoided in subsequent requests for create? No, solr.xml will be modified only if persist=true is passed as a request param. I don't understand your second question. Why would you want to issue create commands for the same core multiple times? Also, if we want to implement Load, can you please give some directions to implement load action? I don't know what you want to do. Loading cores without restarting Solr is possible right now by using the create command. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Create-new-core-on-the-fly-tp14585788p25691408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: field collapsing sums
You might want to see how the stats component works with field collapsing. Thanks, Matt Weber On Sep 30, 2009, at 5:16 PM, Uri Boness wrote: Hi, At the moment I think the most appropriate place to put it is in the AbstractDocumentCollapser (in the getCollapseInfo method). Though, it might not be the most efficient. Cheers, Uri Joe Calderon wrote: hello all, i have a question on the field collapsing patch, say i have an integer field called num_in_stock and i collapse by some other column, is it possible to sum up that integer field and return the total in the output, if not how would i go about extending the collapsing component to support that? thx much --joe