Re: How to group result when search on multiple fields
On Thu, Jan 27, 2011 at 1:25 AM, cyang2010 ysxsu...@hotmail.com wrote: Is Field Collapsing a new feature for solr 4.0 (not yet released yet)? That's at least what the Wiki tells you, yes.
Question About Writing Custom Query Parser Plugin
Hi All I want to integrate lucene Surround Query Parser with solr 1.4.1, and for that I am writing Custom Query Parser Plugin, To accomplish this task I should write a sub class of org.apache.solr.search.QParserPlugin and implement its two methods public void init(NamedList nl) public QParser createParser(String string, SolrParams sp, SolrParams sp1, SolrQueryRequest sqr) now here createParser should return an object of a subclass of org.apache.solr.search.QParser, but I need a parser of type org.apache.lucene.queryParser.surround.parser.QueryParser which is not a subclass of org.apache.solr.search.QParser Now my question is should I write a sub class of org.apache.solr.search.QParser and internally create an object of org.apache.lucene.queryParser.surround.parser.QueryParser and call its parse method? if so how the mapping org.apache.lucene.queryParser.surround.query.SrndQuery (that is returned org.apache.lucene.queryParser.surround.parser.QueryParser ) would be done with org.apache.lucene.search.Query (that should be returned from parse method of a query parser of type org.apache.solr.search.QParser) Thanx Ahsan
Re: Does solr supports indexing of files other than UTF-8
Why is converting documents to utf-8 not feasible? Nowadays any platform offers such services. Can you give a detailed failure description (maybe with the URL to a sample document you post)? paul Le 27 janv. 2011 à 07:31, prasad deshpande a écrit : I am able to successfully index/search non-Engilsh data(like Hebrew, Japnese) which was encoded in UTF-8. However, When I tried to index data which was encoded in local encoding like Big5 for Japanese I could not see the desired results. The contents after indexing looked garbled for Big5 encoded document when I searched for all indexed documents. Converting a complete document in UTF-8 is not feasible. I am not very clear about how Solr support these localizations with other than UTF-8 encoding. I verified below links 1. http://lucene.apache.org/java/3_0_3/api/all/index.html 2. http://wiki.apache.org/solr/LanguageAnalysis Thanks and Regards, Prasad
Re: A Maven archetype that helps packaging Solr as a standalone application embedded in Apache Tomcat
Simone, It's good that you did so! I had found this three days ago while googling. And I am starting to make sense of it. It works well. Two little comments: - you are saying that it packages a standalone multicore and a standalone app. But it actually also packs a webapp. At first, I had rejected using that option because of the standalone output. I think a webapp is more usable. Just a matter of formulation - I have found how to configure my schema and config, could add the velocity contrib to it, but I haven't yet found out how to add further resources. Both src/main/webapp and src/main/resources are ignored. Help for the latter would be nice. paul Le 27 janv. 2011 à 07:58, Simone Tripodi a écrit : Hi all guys, this short mail just to make the Maven/Solr communities aware that we published an Apache Maven archetype[1] (that we lazily called 'solr-packager' :P) that helps Apache Solr developers creating complete standalone Solr-based applications, embedded in Apache Tomcat, with few operations. We started developing it internally to reduce and help the `ops` tasks, since it has been useful we hope it could be also for you, so decided to publish it as oss. Questions, feedbacks, constructive criticisms, ideas... are more than welcome, if interested visit the github[2] page. Have a nice day, all the best Simo [1] http://sourcesense.github.com/solr-packager/ [2] https://github.com/sourcesense/solr-packager http://people.apache.org/~simonetripodi/ http://www.99soft.org/
Re: Does solr supports indexing of files other than UTF-8
The size of docs can be huge, like suppose there are 800MB pdf file to index it I need to translate it in UTF-8 and then send this file to index. Now suppose there can be any number of clients who can upload file. at that time it will affect performance. and already our product support localization with local encoding. Thanks, Prasad On Thu, Jan 27, 2011 at 2:04 PM, Paul Libbrecht p...@hoplahup.net wrote: Why is converting documents to utf-8 not feasible? Nowadays any platform offers such services. Can you give a detailed failure description (maybe with the URL to a sample document you post)? paul Le 27 janv. 2011 à 07:31, prasad deshpande a écrit : I am able to successfully index/search non-Engilsh data(like Hebrew, Japnese) which was encoded in UTF-8. However, When I tried to index data which was encoded in local encoding like Big5 for Japanese I could not see the desired results. The contents after indexing looked garbled for Big5 encoded document when I searched for all indexed documents. Converting a complete document in UTF-8 is not feasible. I am not very clear about how Solr support these localizations with other than UTF-8 encoding. I verified below links 1. http://lucene.apache.org/java/3_0_3/api/all/index.html 2. http://wiki.apache.org/solr/LanguageAnalysis Thanks and Regards, Prasad
DismaxParser Query
Hi all, The query for standard request handler is as follows field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND field5:(keyword5) How the same above query can be written for dismax request handler -- Thanks Regards, Isan Fulia.
Re: Does solr supports indexing of files other than UTF-8
At least in java utf-8 transcoding is done on a stream basis. No issue there. paul Le 27 janv. 2011 à 09:51, prasad deshpande a écrit : The size of docs can be huge, like suppose there are 800MB pdf file to index it I need to translate it in UTF-8 and then send this file to index. Now suppose there can be any number of clients who can upload file. at that time it will affect performance. and already our product support localization with local encoding. Thanks, Prasad On Thu, Jan 27, 2011 at 2:04 PM, Paul Libbrecht p...@hoplahup.net wrote: Why is converting documents to utf-8 not feasible? Nowadays any platform offers such services. Can you give a detailed failure description (maybe with the URL to a sample document you post)? paul Le 27 janv. 2011 à 07:31, prasad deshpande a écrit : I am able to successfully index/search non-Engilsh data(like Hebrew, Japnese) which was encoded in UTF-8. However, When I tried to index data which was encoded in local encoding like Big5 for Japanese I could not see the desired results. The contents after indexing looked garbled for Big5 encoded document when I searched for all indexed documents. Converting a complete document in UTF-8 is not feasible. I am not very clear about how Solr support these localizations with other than UTF-8 encoding. I verified below links 1. http://lucene.apache.org/java/3_0_3/api/all/index.html 2. http://wiki.apache.org/solr/LanguageAnalysis Thanks and Regards, Prasad
Tika config in ExtractingRequestHandler
The wiki page for the ExtractingRequestHandler says that I can add the following configuration: str name=tika.config/my/path/to/tika.config/str I have tried to google for an example of such a Tika config file, but haven't found anything. Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Post PDF to solr with asp.net
Hi We are trying to post some PDF documents to solr for indexing using ASP.net but cannot find any documentation or a library that will allow posting of binary data. Has anyone done this and if so, how? Regards Andrew McCombe iWeb Solutions Ltd.
query range in multivalued date field
hi all. My query range for multivalued date field work incorrect. My schema. There is field requestDate that have multivalued attr.: fields field name=id type=string indexed=true stored=true required=true / field name=keyword type=text indexed=true stored=true / field name=count type=float indexed=true stored=true / field name=isResult type=int indexed=true stored=true default=0 multiValued=true / field name=requestDate type=date indexed=true stored=true multiValued=true / /fields Some data from the index: doc float name=count2.0/float str name=idsale/str arr name=isResultint1/intint1/int/arr str name=keywordsale/str arr name=requestDatedate2011-01-26T08:18:35Z/datedate2011-01-27T01:31:28Z/date/arr /doc doc float name=count3.0/float str name=idcoldpop/str arr name=isResultint1/intint1/intint1/int/arr str name=keywordcold pop/str arr name=requestDatedate2011-01-27T01:30:01Z/datedate2011-01-27T01:32:01Z/datedate2011-01-27T01:32:18Z/date/arr /doc I try to search some docs where date is in some range, for example, http://localhost:8983/request/select?q=requestDate:[NOW/HOUR-1HOUR TO NOW/HOUR] There are no result. After some analyzing, I saw, that this range works only for first item in the requestDate field, but don't filtered for another items. Where is my mistake? Or SOLR can't filtered multivalued date fields? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/query-range-in-multivalued-date-field-tp2361292p2361292.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DismaxParser Query
use dismax q for first three fields and a filter query for the 4th and 5th fields so q=keyword1 keyword 2 qf = field1,feild2,field3 pf = field1,feild2,field3 mm=something sensible for you defType=dismax fq= field4:(keyword3 OR keyword4) AND field5:(keyword5) take a look at the dismax docs for extra params On 27 January 2011 08:52, Isan Fulia isan.fu...@germinait.com wrote: Hi all, The query for standard request handler is as follows field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND field5:(keyword5) How the same above query can be written for dismax request handler -- Thanks Regards, Isan Fulia.
Re: DismaxParser Query
but q=keyword1 keyword2 does AND operation not OR On 27 January 2011 16:22, lee carroll lee.a.carr...@googlemail.com wrote: use dismax q for first three fields and a filter query for the 4th and 5th fields so q=keyword1 keyword 2 qf = field1,feild2,field3 pf = field1,feild2,field3 mm=something sensible for you defType=dismax fq= field4:(keyword3 OR keyword4) AND field5:(keyword5) take a look at the dismax docs for extra params On 27 January 2011 08:52, Isan Fulia isan.fu...@germinait.com wrote: Hi all, The query for standard request handler is as follows field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND field5:(keyword5) How the same above query can be written for dismax request handler -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
DIH and duplicate content
Hi, Is there a way to avoid duplicate content in a index at the moment i'm uploading my xml feed via DIH? I would like to have only one entry for a given description. I mean if the desciption of one product already exist in index not import this new product. Is there a built in function? Or any hack? thanks for your help Rosa
Re: DismaxParser Query
the default operation can be set in your config to be or or on the query something like q.op=OR On 27 January 2011 11:26, Isan Fulia isan.fu...@germinait.com wrote: but q=keyword1 keyword2 does AND operation not OR On 27 January 2011 16:22, lee carroll lee.a.carr...@googlemail.com wrote: use dismax q for first three fields and a filter query for the 4th and 5th fields so q=keyword1 keyword 2 qf = field1,feild2,field3 pf = field1,feild2,field3 mm=something sensible for you defType=dismax fq= field4:(keyword3 OR keyword4) AND field5:(keyword5) take a look at the dismax docs for extra params On 27 January 2011 08:52, Isan Fulia isan.fu...@germinait.com wrote: Hi all, The query for standard request handler is as follows field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND field5:(keyword5) How the same above query can be written for dismax request handler -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: DismaxParser Query
The DisMax query parser internally hard-codes its operator to OR. This is quite unlike the Lucene query parser, for which the default operator can be configured using the solrQueryParser in schema.xml Regards, Bijeet Singh On Thu, Jan 27, 2011 at 4:56 PM, Isan Fulia isan.fu...@germinait.comwrote: but q=keyword1 keyword2 does AND operation not OR On 27 January 2011 16:22, lee carroll lee.a.carr...@googlemail.com wrote: use dismax q for first three fields and a filter query for the 4th and 5th fields so q=keyword1 keyword 2 qf = field1,feild2,field3 pf = field1,feild2,field3 mm=something sensible for you defType=dismax fq= field4:(keyword3 OR keyword4) AND field5:(keyword5) take a look at the dismax docs for extra params On 27 January 2011 08:52, Isan Fulia isan.fu...@germinait.com wrote: Hi all, The query for standard request handler is as follows field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND field5:(keyword5) How the same above query can be written for dismax request handler -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: DismaxParser Query
sorry ignore that - we are on dismax here - look at mm param in the docs you can set this to achieve what you need On 27 January 2011 11:34, lee carroll lee.a.carr...@googlemail.com wrote: the default operation can be set in your config to be or or on the query something like q.op=OR On 27 January 2011 11:26, Isan Fulia isan.fu...@germinait.com wrote: but q=keyword1 keyword2 does AND operation not OR On 27 January 2011 16:22, lee carroll lee.a.carr...@googlemail.com wrote: use dismax q for first three fields and a filter query for the 4th and 5th fields so q=keyword1 keyword 2 qf = field1,feild2,field3 pf = field1,feild2,field3 mm=something sensible for you defType=dismax fq= field4:(keyword3 OR keyword4) AND field5:(keyword5) take a look at the dismax docs for extra params On 27 January 2011 08:52, Isan Fulia isan.fu...@germinait.com wrote: Hi all, The query for standard request handler is as follows field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND field5:(keyword5) How the same above query can be written for dismax request handler -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: DIH and duplicate content
http://wiki.apache.org/solr/Deduplication On Thursday 27 January 2011 12:32:29 Rosa (Anuncios) wrote: Is there a way to avoid duplicate content in a index at the moment i'm uploading my xml feed via DIH? I would like to have only one entry for a given description. I mean if the desciption of one product already exist in index not import this new product. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: A Maven archetype that helps packaging Solr as a standalone application embedded in Apache Tomcat
Hi Paul, thanks a lot for your feedbacks, much more than appreciated! :) Going through your comments: * Yes it also packs a Solr webepp, it is needed to embed it in Tomcat. Do you think it could be a useful feature having also webapp .war as output? if it helps, I'm open to add it as well. * src/main/webapp and src/main/resources are ignored because I didn't use the war plugin, everything is configured in the assembly descriptor ATM. As a workaround, you can add resources on src/solr/* subdirectory and it will be included in the webapp; when the war plugin will be plugged (previous comment), that issue should be solved. Can you tell me a little more about the velocity contrib, please? In the multicore, I'd like the solr.xml will be generated during the build-time analyzing the dependencies but I didn't figure out how to do it. Many thanks in advance! http://people.apache.org/~simonetripodi/ http://www.99soft.org/ On Thu, Jan 27, 2011 at 9:49 AM, Paul Libbrecht p...@hoplahup.net wrote: Simone, It's good that you did so! I had found this three days ago while googling. And I am starting to make sense of it. It works well. Two little comments: - you are saying that it packages a standalone multicore and a standalone app. But it actually also packs a webapp. At first, I had rejected using that option because of the standalone output. I think a webapp is more usable. Just a matter of formulation - I have found how to configure my schema and config, could add the velocity contrib to it, but I haven't yet found out how to add further resources. Both src/main/webapp and src/main/resources are ignored. Help for the latter would be nice. paul Le 27 janv. 2011 à 07:58, Simone Tripodi a écrit : Hi all guys, this short mail just to make the Maven/Solr communities aware that we published an Apache Maven archetype[1] (that we lazily called 'solr-packager' :P) that helps Apache Solr developers creating complete standalone Solr-based applications, embedded in Apache Tomcat, with few operations. We started developing it internally to reduce and help the `ops` tasks, since it has been useful we hope it could be also for you, so decided to publish it as oss. Questions, feedbacks, constructive criticisms, ideas... are more than welcome, if interested visit the github[2] page. Have a nice day, all the best Simo [1] http://sourcesense.github.com/solr-packager/ [2] https://github.com/sourcesense/solr-packager http://people.apache.org/~simonetripodi/ http://www.99soft.org/
Re: configure httpclient to access solr with user credential on third party host
Looks like you are connecting to Tomcat's AJP port, not the HTTP one. Connect to the Tomcat HTTP port and I suspect you'll have greater success. Upayavira On Wed, 26 Jan 2011 22:45 -0800, Darniz rnizamud...@edmunds.com wrote: Hello, i uploaded solr.war file on my hosting provider and added security constraint in web.xml file on my solr war so that only specific user with a certain role can issue get and post request. When i open browser and type www.maydomainname.com/solr i get a dialog box to enter userid and password. No issues until now. Now the issue is that i have one more app on the same tomcat container which will index document into solr. In order for this app to issue post request it has to configure the http client credentials. I checked with my hosting service and they told me at tomcat is running on port 8834 since apache is sitting in the front, the below is the code snipped i use to set http credentials. CommonsHttpSolrServer server = new CommonsHttpSolrServer(http://localhost:8834/solr;); Credentials defaultcreds = new UsernamePasswordCredentials(solr,solr); server.getHttpClient().getState().setCredentials(new AuthScope(localhost,8834,AuthScope.ANY_REALM), defaultcreds); i am getting the following error, any help will be appreciated. ERROR TP-Processor9 org.apache.jk.common.MsgAjp - BAD packet signature 20559 ERROR TP-Processor9 org.apache.jk.common.ChannelSocket - Error, processing connection java.lang.IndexOutOfBoundsException at java.io.BufferedInputStream.read(BufferedInputStream.java:310) at org.apache.jk.common.ChannelSocket.read(ChannelSocket.java:621) at org.apache.jk.common.ChannelSocket.receive(ChannelSocket.java:578) at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:686) at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:891) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690) at java.lang.Thread.run(Thread.java:619) ERROR TP-Processor9 org.apache.jk.common.MsgAjp - BAD packet signature 20559 ERROR TP-Processor9 org.apache.jk.common.ChannelSocket - Error, processing connection java.lang.IndexOutOfBoundsException at java.io.BufferedInputStream.read(BufferedInputStream.java:310) at org.apache.jk.common.ChannelSocket.read(ChannelSocket.java:621) at org.apache.jk.common.ChannelSocket.receive(ChannelSocket.java:578) at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:686) at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:891) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690) at java.lang.Thread.run(Thread.java:619) -- View this message in context: http://lucene.472066.n3.nabble.com/configure-httpclient-to-access-solr-with-user-credential-on-third-party-host-tp2360364p2360364.html Sent from the Solr - User mailing list archive at Nabble.com. --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: A Maven archetype that helps packaging Solr as a standalone application embedded in Apache Tomcat
Le 27 janv. 2011 à 12:42, Simone Tripodi a écrit : thanks a lot for your feedbacks, much more than appreciated! :) Good time sync. I need it right now. * Yes it also packs a Solr webepp, it is needed to embed it in Tomcat. Do you think it could be a useful feature having also webapp .war as output? if it helps, I'm open to add it as well. I feel so. Or at least say that it's a side production even if it's not an individual goal. * src/main/webapp and src/main/resources are ignored because I didn't use the war plugin, everything is configured in the assembly descriptor ATM. As a workaround, you can add resources on src/solr/* subdirectory and it will be included in the webapp; But only in WEB-INF/classes... that doesn't seem right to be served as a static resource (I'm looking at css or js files). when the war plugin will be plugged (previous comment), that issue should be solved. Any time estimate? Can you tell me a little more about the velocity contrib, please? I added the dependency. I copied in src/main/solr/commons the velocity config files. I note that I had to deactivate the query-elevation which seems to expect a solr-home. In the multicore, I'd like the solr.xml will be generated during the build-time analyzing the dependencies but I didn't figure out how to do it. Many thanks in advance! I should also say. At first I tried the multicore one and it failed on me... not too sure why but it did not have sufficient output. paul
Re: query range in multivalued date field
Range queries work on multivalued fields. I suspect the date math conversion is fooling you. For instance,NOW/HOUR first rounds down to the current hour, *then* subtracts one hour. If you attach debugQuery=on (or check the debug checkbox in the admin full search page), you'll see the exact results of the conversion, that may help. Best Erick On Thu, Jan 27, 2011 at 5:15 AM, ramzesua michaelnaza...@gmail.com wrote: hi all. My query range for multivalued date field work incorrect. My schema. There is field requestDate that have multivalued attr.: fields field name=id type=string indexed=true stored=true required=true / field name=keyword type=text indexed=true stored=true / field name=count type=float indexed=true stored=true / field name=isResult type=int indexed=true stored=true default=0 multiValued=true / field name=requestDate type=date indexed=true stored=true multiValued=true / /fields Some data from the index: doc float name=count2.0/float str name=idsale/str arr name=isResultint1/intint1/int/arr str name=keywordsale/str arr name=requestDatedate2011-01-26T08:18:35Z/datedate2011-01-27T01:31:28Z/date/arr /doc doc float name=count3.0/float str name=idcoldpop/str arr name=isResultint1/intint1/intint1/int/arr str name=keywordcold pop/str arr name=requestDatedate2011-01-27T01:30:01Z/datedate2011-01-27T01:32:01Z/datedate2011-01-27T01:32:18Z/date/arr /doc I try to search some docs where date is in some range, for example, http://localhost:8983/request/select?q=requestDate:[NOW/HOUR-1HOUR TO NOW/HOUR] There are no result. After some analyzing, I saw, that this range works only for first item in the requestDate field, but don't filtered for another items. Where is my mistake? Or SOLR can't filtered multivalued date fields? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/query-range-in-multivalued-date-field-tp2361292p2361292.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DismaxParser Query
It worked by making mm=0 (it acted as OR operator) but how to handle this field1:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field2:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field3:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) On 27 January 2011 17:06, lee carroll lee.a.carr...@googlemail.com wrote: sorry ignore that - we are on dismax here - look at mm param in the docs you can set this to achieve what you need On 27 January 2011 11:34, lee carroll lee.a.carr...@googlemail.com wrote: the default operation can be set in your config to be or or on the query something like q.op=OR On 27 January 2011 11:26, Isan Fulia isan.fu...@germinait.com wrote: but q=keyword1 keyword2 does AND operation not OR On 27 January 2011 16:22, lee carroll lee.a.carr...@googlemail.com wrote: use dismax q for first three fields and a filter query for the 4th and 5th fields so q=keyword1 keyword 2 qf = field1,feild2,field3 pf = field1,feild2,field3 mm=something sensible for you defType=dismax fq= field4:(keyword3 OR keyword4) AND field5:(keyword5) take a look at the dismax docs for extra params On 27 January 2011 08:52, Isan Fulia isan.fu...@germinait.com wrote: Hi all, The query for standard request handler is as follows field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND field5:(keyword5) How the same above query can be written for dismax request handler -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: How to find Master Slave are in sync
Markus, The problem here is if I call the below two URLs immediately after replication then I am getting both the index versions as same. In my python script I have added code to swap the online core on master with offline core on master and online core on slave with offline core on slave, if both the versions are same. After calling swap, I am getting error in slave's log like below. So I am confused why this is happening. Can you please help me on this? http://master_host:port/solr/replication?command=indexversion http://slave_host:port/solr/replication?command=details 2011-01-27 07:45:26,713 WARN [org.apache.solr.handler.SnapPuller] (Thread-59) No content recieved for file: {size=154098810, name=_e3.cfx, lastmodified=1296132092000} 2011-01-27 07:45:27,396 ERROR [org.apache.solr.handler.ReplicationHandler] (Thread-59) SnapPull failed org.apache.solr.common.SolrException: Unable to download _e3.cfx completely. Downloaded 0!=154098810 at org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1026) at org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:906) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:541) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:294) -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-find-Master-Slave-are-in-sync-tp2287014p2362679.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Post PDF to solr with asp.net
On Thu, Jan 27, 2011 at 3:44 PM, Andrew McCombe eupe...@gmail.com wrote: Hi We are trying to post some PDF documents to solr for indexing using ASP.net but cannot find any documentation or a library that will allow posting of binary data. [...] Do not have much idea of ASP.net, but SolrNet ( http://code.google.com/p/solrnet/ ) seems to be one such library. Also, one can use Solr's web interface to POST documents. Please see http://wiki.apache.org/solr/UpdateXmlMessages and a shell script example included as example/exampledocs/post.sh in the Solr source code Regards, Gora
Re: DismaxParser Query
with dismax you get to say things like match all terms if less then 3 terms entered else match term-x it produces highly flexible and relevant matches and works very well in lots of common search usescases. field boosting allows further tuning. if you have rigid rules like the last one you quote i don't think dismax is for you. Although i might be wrong and some one might be able to help On 27 January 2011 13:32, Isan Fulia isan.fu...@germinait.com wrote: It worked by making mm=0 (it acted as OR operator) but how to handle this field1:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field2:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field3:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) On 27 January 2011 17:06, lee carroll lee.a.carr...@googlemail.com wrote: sorry ignore that - we are on dismax here - look at mm param in the docs you can set this to achieve what you need On 27 January 2011 11:34, lee carroll lee.a.carr...@googlemail.com wrote: the default operation can be set in your config to be or or on the query something like q.op=OR On 27 January 2011 11:26, Isan Fulia isan.fu...@germinait.com wrote: but q=keyword1 keyword2 does AND operation not OR On 27 January 2011 16:22, lee carroll lee.a.carr...@googlemail.com wrote: use dismax q for first three fields and a filter query for the 4th and 5th fields so q=keyword1 keyword 2 qf = field1,feild2,field3 pf = field1,feild2,field3 mm=something sensible for you defType=dismax fq= field4:(keyword3 OR keyword4) AND field5:(keyword5) take a look at the dismax docs for extra params On 27 January 2011 08:52, Isan Fulia isan.fu...@germinait.com wrote: Hi all, The query for standard request handler is as follows field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND field5:(keyword5) How the same above query can be written for dismax request handler -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Import Handler for tokenizing facet string into multi-valued solr.StrField..
Hi, Pretty novice into SOLR coding, but looking for hints about how (if not already done) to implement a PatternTokenizer, that would index this into multivalie fields of solr.StrField for facetting. Ex. Water -- Irrigation ; Water -- Sewage should be tokenized into Water Irrigation Sewage in multi-valued non-tokenized fields due to performance. I could do it from the outside, but I would this as a opportunity to learn about SOLR. It works as I want with the PatternTokenizerFactory when I am using solr.TextField, but not when I am using the non-tokenized solr.StrField. But according to reading, facets performance is better on non-tokenized fields. We need better performance on our faceted searches on these multi-value fields. (25 million documents, three multi-valued facets) I would also need to have a filter that filter out identical values as the feeds have redundant data as shown above. Can anyone point point me in the right direction.. cheers, :-Dennis
Re: How to find Master Slave are in sync
Let's back up a moment and ask why you are doing this from scripts, because this feels like an XY problem, see: http://people.apache.org/~hossman/#xyproblem http://people.apache.org/~hossman/#xyproblem What are you trying to accomplish by swapping cores on the master and slave? Solr 1.4 has configuration-based replication, are you using 1.4? This version of Solr automatically, upon replication, switches to the updated index. You can trigger a replication either by configuring the polling interval on the slave or by sending the proper HTTP request to the slave. See: http://wiki.apache.org/solr/SolrReplication#HTTP_API So, it seems like taking charge of swapping cores may be more work than you really need to do. Of course, if you're on a different version of Solr, this is irrelevant. Best Erick On Thu, Jan 27, 2011 at 8:38 AM, Shanmugavel SRD srdshanmuga...@gmail.comwrote: Markus, The problem here is if I call the below two URLs immediately after replication then I am getting both the index versions as same. In my python script I have added code to swap the online core on master with offline core on master and online core on slave with offline core on slave, if both the versions are same. After calling swap, I am getting error in slave's log like below. So I am confused why this is happening. Can you please help me on this? http://master_host:port/solr/replication?command=indexversion http://slave_host:port/solr/replication?command=details 2011-01-27 07:45:26,713 WARN [org.apache.solr.handler.SnapPuller] (Thread-59) No content recieved for file: {size=154098810, name=_e3.cfx, lastmodified=1296132092000} 2011-01-27 07:45:27,396 ERROR [org.apache.solr.handler.ReplicationHandler] (Thread-59) SnapPull failed org.apache.solr.common.SolrException: Unable to download _e3.cfx completely. Downloaded 0!=154098810 at org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1026) at org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:906) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:541) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:294) -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-find-Master-Slave-are-in-sync-tp2287014p2362679.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DismaxParser Query
What version of Solr are you using, and could you consider either 3x or applying a patch to 1.4.1? Because eDismax (extended dismax) handles the full Lucene query language and probably works here. See the Solr JIRA 1553 at https://issues.apache.org/jira/browse/SOLR-1553 Best Erick On Thu, Jan 27, 2011 at 8:32 AM, Isan Fulia isan.fu...@germinait.comwrote: It worked by making mm=0 (it acted as OR operator) but how to handle this field1:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field2:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field3:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) On 27 January 2011 17:06, lee carroll lee.a.carr...@googlemail.com wrote: sorry ignore that - we are on dismax here - look at mm param in the docs you can set this to achieve what you need On 27 January 2011 11:34, lee carroll lee.a.carr...@googlemail.com wrote: the default operation can be set in your config to be or or on the query something like q.op=OR On 27 January 2011 11:26, Isan Fulia isan.fu...@germinait.com wrote: but q=keyword1 keyword2 does AND operation not OR On 27 January 2011 16:22, lee carroll lee.a.carr...@googlemail.com wrote: use dismax q for first three fields and a filter query for the 4th and 5th fields so q=keyword1 keyword 2 qf = field1,feild2,field3 pf = field1,feild2,field3 mm=something sensible for you defType=dismax fq= field4:(keyword3 OR keyword4) AND field5:(keyword5) take a look at the dismax docs for extra params On 27 January 2011 08:52, Isan Fulia isan.fu...@germinait.com wrote: Hi all, The query for standard request handler is as follows field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND field5:(keyword5) How the same above query can be written for dismax request handler -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
AW: DismaxParser Query
It may also be an option to mix the query parsers? Something like this (not tested): q={!lucene}field1:test OR field2:test2 _query_:{!dismax qf=fields}+my dismax -bad So you have the benefits of lucene and dismax parser -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Donnerstag, 27. Januar 2011 15:15 An: solr-user@lucene.apache.org Betreff: Re: DismaxParser Query What version of Solr are you using, and could you consider either 3x or applying a patch to 1.4.1? Because eDismax (extended dismax) handles the full Lucene query language and probably works here. See the Solr JIRA 1553 at https://issues.apache.org/jira/browse/SOLR-1553 Best Erick On Thu, Jan 27, 2011 at 8:32 AM, Isan Fulia isan.fu...@germinait.comwrote: It worked by making mm=0 (it acted as OR operator) but how to handle this field1:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field2:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field3:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) On 27 January 2011 17:06, lee carroll lee.a.carr...@googlemail.com wrote: sorry ignore that - we are on dismax here - look at mm param in the docs you can set this to achieve what you need On 27 January 2011 11:34, lee carroll lee.a.carr...@googlemail.com wrote: the default operation can be set in your config to be or or on the query something like q.op=OR On 27 January 2011 11:26, Isan Fulia isan.fu...@germinait.com wrote: but q=keyword1 keyword2 does AND operation not OR On 27 January 2011 16:22, lee carroll lee.a.carr...@googlemail.com wrote: use dismax q for first three fields and a filter query for the 4th and 5th fields so q=keyword1 keyword 2 qf = field1,feild2,field3 pf = field1,feild2,field3 mm=something sensible for you defType=dismax fq= field4:(keyword3 OR keyword4) AND field5:(keyword5) take a look at the dismax docs for extra params On 27 January 2011 08:52, Isan Fulia isan.fu...@germinait.com wrote: Hi all, The query for standard request handler is as follows field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND field5:(keyword5) How the same above query can be written for dismax request handler -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Detect Out of Memory Errors
Hi, is ther a way by which i could detect the out of memory errors in solr so that i could implement some functionality such as restarting the tomcat or alert me via email whenever such error is detected.? -- View this message in context: http://lucene.472066.n3.nabble.com/Detect-Out-of-Memory-Errors-tp2362872p2362872.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question About Writing Custom Query Parser Plugin
Any One On Thu, Jan 27, 2011 at 1:27 PM, Ahson Iqbal mianah...@yahoo.com wrote: Hi All I want to integrate lucene Surround Query Parser with solr 1.4.1, and for that I am writing Custom Query Parser Plugin, To accomplish this task I should write a sub class of org.apache.solr.search.QParserPlugin and implement its two methods public void init(NamedList nl) public QParser createParser(String string, SolrParams sp, SolrParams sp1, SolrQueryRequest sqr) now here createParser should return an object of a subclass of org.apache.solr.search.QParser, but I need a parser of type org.apache.lucene.queryParser.surround.parser.QueryParser which is not a subclass of org.apache.solr.search.QParser Now my question is should I write a sub class of org.apache.solr.search.QParser and internally create an object of org.apache.lucene.queryParser.surround.parser.QueryParser and call its parse method? if so how the mapping org.apache.lucene.queryParser.surround.query.SrndQuery (that is returned org.apache.lucene.queryParser.surround.parser.QueryParser ) would be done with org.apache.lucene.search.Query (that should be returned from parse method of a query parser of type org.apache.solr.search.QParser) Thanx Ahsan
Re: Question About Writing Custom Query Parser Plugin
Yes, you need to create both a QParserPlugin and a QParser implementation. Look at Solr's own source code for the LuceneQParserPlugin/LuceneQParser and built it like that. Baking the surround query parser into Solr out of the box would be a useful contribution, so if you care to give it a little bit of polish/unit testing and submit a patch, the community would be thankful :) Erik On Jan 27, 2011, at 03:27 , Ahson Iqbal wrote: Hi All I want to integrate lucene Surround Query Parser with solr 1.4.1, and for that I am writing Custom Query Parser Plugin, To accomplish this task I should write a sub class of org.apache.solr.search.QParserPlugin and implement its two methods public void init(NamedList nl) public QParser createParser(String string, SolrParams sp, SolrParams sp1, SolrQueryRequest sqr) now here createParser should return an object of a subclass of org.apache.solr.search.QParser, but I need a parser of type org.apache.lucene.queryParser.surround.parser.QueryParser which is not a subclass of org.apache.solr.search.QParser Now my question is should I write a sub class of org.apache.solr.search.QParser and internally create an object of org.apache.lucene.queryParser.surround.parser.QueryParser and call its parse method? if so how the mapping org.apache.lucene.queryParser.surround.query.SrndQuery (that is returned org.apache.lucene.queryParser.surround.parser.QueryParser ) would be done with org.apache.lucene.search.Query (that should be returned from parse method of a query parser of type org.apache.solr.search.QParser) Thanx Ahsan
Re: A Maven archetype that helps packaging Solr as a standalone application embedded in Apache Tomcat
Le 27 janv. 2011 à 12:42, Simone Tripodi a écrit : thanks a lot for your feedbacks, much more than appreciated! :) One more anomaly I find: the license is in the output of the pom.xml. I think this should not be the case. *my* license should be there, not the license of the archetype. Or? paul
Re: Tika config in ExtractingRequestHandler
I believe that as along as Tika is included in a folder that is referenced by solrconfig.xml you should be good. Solr will automatically throw mime types to Tika for parsing. Can anyone else add to this? Thanks, Adam On Thu, Jan 27, 2011 at 5:06 AM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: The wiki page for the ExtractingRequestHandler says that I can add the following configuration: str name=tika.config/my/path/to/tika.config/str I have tried to google for an example of such a Tika config file, but haven't found anything. Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: A Maven archetype that helps packaging Solr as a standalone application embedded in Apache Tomcat
Hi Paul, sorry I'm late but I've been in the middle of a conf call :( On which IRC server the #solr channel is? I'll reach you ASAP. Thanks a lot! Simo http://people.apache.org/~simonetripodi/ http://www.99soft.org/ On Thu, Jan 27, 2011 at 4:00 PM, Paul Libbrecht p...@hoplahup.net wrote: Le 27 janv. 2011 à 12:42, Simone Tripodi a écrit : thanks a lot for your feedbacks, much more than appreciated! :) One more anomaly I find: the license is in the output of the pom.xml. I think this should not be the case. *my* license should be there, not the license of the archetype. Or? paul
Re: A Maven archetype that helps packaging Solr as a standalone application embedded in Apache Tomcat
Simo, it's freenode.net On Thu, Jan 27, 2011 at 4:16 PM, Simone Tripodi simonetrip...@apache.orgwrote: Hi Paul, sorry I'm late but I've been in the middle of a conf call :( On which IRC server the #solr channel is? I'll reach you ASAP. Thanks a lot! Simo http://people.apache.org/~simonetripodi/ http://www.99soft.org/ On Thu, Jan 27, 2011 at 4:00 PM, Paul Libbrecht p...@hoplahup.net wrote: Le 27 janv. 2011 à 12:42, Simone Tripodi a écrit : thanks a lot for your feedbacks, much more than appreciated! :) One more anomaly I find: the license is in the output of the pom.xml. I think this should not be the case. *my* license should be there, not the license of the archetype. Or? paul
RE: DismaxParser Query
Yes, I think nested queries are the only way to do that, and yes, nested queries like Daniel's example work (I've done it myself). I haven't really tried to get into understanding/demonstrating _exactly_ how the relevance ends up working on the overall master query in such a situation, but it sort of works. (Just note that Daniel's example isn't quite right, I think you need double quotes for the nested _query_, just check the wiki page/blog post on nested queries). Does eDismax handle parens for order of operation too? If so, eDismax is probably the best/easiest solution, especially if you're trying to parse an incoming query from some OTHER format and translate it to something that can be sent to Solr, which is what I often do. I haven't messed with eDismax myself yet. Does anyone know if there's any easy (easy!) way to get eDismax in a Solr 1.4? Any easy way to compile an eDismax query parser on it's own that works with Solr 1.4, and then just drop it into your local lib/ for use with an existing Solr 1.4? Jonathan From: Daniel Pötzinger [daniel.poetzin...@aoemedia.de] Sent: Thursday, January 27, 2011 9:26 AM To: solr-user@lucene.apache.org Subject: AW: DismaxParser Query It may also be an option to mix the query parsers? Something like this (not tested): q={!lucene}field1:test OR field2:test2 _query_:{!dismax qf=fields}+my dismax -bad So you have the benefits of lucene and dismax parser -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Donnerstag, 27. Januar 2011 15:15 An: solr-user@lucene.apache.org Betreff: Re: DismaxParser Query What version of Solr are you using, and could you consider either 3x or applying a patch to 1.4.1? Because eDismax (extended dismax) handles the full Lucene query language and probably works here. See the Solr JIRA 1553 at https://issues.apache.org/jira/browse/SOLR-1553 Best Erick On Thu, Jan 27, 2011 at 8:32 AM, Isan Fulia isan.fu...@germinait.comwrote: It worked by making mm=0 (it acted as OR operator) but how to handle this field1:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field2:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field3:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) On 27 January 2011 17:06, lee carroll lee.a.carr...@googlemail.com wrote: sorry ignore that - we are on dismax here - look at mm param in the docs you can set this to achieve what you need On 27 January 2011 11:34, lee carroll lee.a.carr...@googlemail.com wrote: the default operation can be set in your config to be or or on the query something like q.op=OR On 27 January 2011 11:26, Isan Fulia isan.fu...@germinait.com wrote: but q=keyword1 keyword2 does AND operation not OR On 27 January 2011 16:22, lee carroll lee.a.carr...@googlemail.com wrote: use dismax q for first three fields and a filter query for the 4th and 5th fields so q=keyword1 keyword 2 qf = field1,feild2,field3 pf = field1,feild2,field3 mm=something sensible for you defType=dismax fq= field4:(keyword3 OR keyword4) AND field5:(keyword5) take a look at the dismax docs for extra params On 27 January 2011 08:52, Isan Fulia isan.fu...@germinait.com wrote: Hi all, The query for standard request handler is as follows field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND field5:(keyword5) How the same above query can be written for dismax request handler -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: Tika config in ExtractingRequestHandler
If this configuration file is the same as the tika-mimetypes.xml file inside Nutch' conf file, I have an example. I was trying to implement language detection for Solr and thought I had to invoke some Tika functionality by this configuration file in order to do so, but found out that I could rewrite some of the ExtractingRequestHandler classes instead. Erlend On 27.01.11 16.12, Adam Estrada wrote: I believe that as along as Tika is included in a folder that is referenced by solrconfig.xml you should be good. Solr will automatically throw mime types to Tika for parsing. Can anyone else add to this? Thanks, Adam On Thu, Jan 27, 2011 at 5:06 AM, Erlend Garåsene.f.gara...@usit.uio.no wrote: The wiki page for the ExtractingRequestHandler says that I can add the following configuration: str name=tika.config/my/path/to/tika.config/str I have tried to google for an example of such a Tika config file, but haven't found anything. Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Import Handler for tokenizing facet string into multi-valued solr.StrField..
Tokenization is fine with facets, that caution is about, say, faceting on the tokenized body of a document where you have potentially a huge number of unique tokens. But if there is a controlled number of distinct values, you shouldn't have to do anything except index to a tokenized field. I'd remove stemming, WordDelimiterFactory, etc though, in fact I'd probably just go with WhiteSpaceTokenizer and, maybe, LowerCaseFilter. But if you have a huge number of unique values, it doesn't matter whether they are tokenized or strings, it'll still be a problem. One note: when faceting for the first time on a newly-started Solr instance, the caches are filled and the *first* query will be slower, so measure subsequent queries. Best Erick On Thu, Jan 27, 2011 at 9:09 AM, Dennis Schafroth den...@indexdata.comwrote: Hi, Pretty novice into SOLR coding, but looking for hints about how (if not already done) to implement a PatternTokenizer, that would index this into multivalie fields of solr.StrField for facetting. Ex. Water -- Irrigation ; Water -- Sewage should be tokenized into Water Irrigation Sewage in multi-valued non-tokenized fields due to performance. I could do it from the outside, but I would this as a opportunity to learn about SOLR. It works as I want with the PatternTokenizerFactory when I am using solr.TextField, but not when I am using the non-tokenized solr.StrField. But according to reading, facets performance is better on non-tokenized fields. We need better performance on our faceted searches on these multi-value fields. (25 million documents, three multi-valued facets) I would also need to have a filter that filter out identical values as the feeds have redundant data as shown above. Can anyone point point me in the right direction.. cheers, :-Dennis
Re: Import Handler for tokenizing facet string into multi-valued solr.StrField..
Beyond what Erick said, I'll add that it is often better to do this from the outside and send in multiple actual end-user displayable facet values. When you send in a field like Water -- Irrigation ; Water -- Sewage, that is what will get stored (if you have it set to stored), but what you might rather want is each individual value stored, which can only be done by the indexer sending in multiple values, not through just tokenization. Erik On Jan 27, 2011, at 09:09 , Dennis Schafroth wrote: Hi, Pretty novice into SOLR coding, but looking for hints about how (if not already done) to implement a PatternTokenizer, that would index this into multivalie fields of solr.StrField for facetting. Ex. Water -- Irrigation ; Water -- Sewage should be tokenized into Water Irrigation Sewage in multi-valued non-tokenized fields due to performance. I could do it from the outside, but I would this as a opportunity to learn about SOLR. It works as I want with the PatternTokenizerFactory when I am using solr.TextField, but not when I am using the non-tokenized solr.StrField. But according to reading, facets performance is better on non-tokenized fields. We need better performance on our faceted searches on these multi-value fields. (25 million documents, three multi-valued facets) I would also need to have a filter that filter out identical values as the feeds have redundant data as shown above. Can anyone point point me in the right direction.. cheers, :-Dennis
Re: Import Handler for tokenizing facet string into multi-valued solr.StrField..
Thanks for the hints! Sorry about stealing the thread query range in multivalued date field Mistakenly responded to it. cheers, :-Dennis On 27/01/2011, at 16.48, Erik Hatcher wrote: Beyond what Erick said, I'll add that it is often better to do this from the outside and send in multiple actual end-user displayable facet values. When you send in a field like Water -- Irrigation ; Water -- Sewage, that is what will get stored (if you have it set to stored), but what you might rather want is each individual value stored, which can only be done by the indexer sending in multiple values, not through just tokenization. Erik On Jan 27, 2011, at 09:09 , Dennis Schafroth wrote: Hi, Pretty novice into SOLR coding, but looking for hints about how (if not already done) to implement a PatternTokenizer, that would index this into multivalie fields of solr.StrField for facetting. Ex. Water -- Irrigation ; Water -- Sewage should be tokenized into Water Irrigation Sewage in multi-valued non-tokenized fields due to performance. I could do it from the outside, but I would this as a opportunity to learn about SOLR. It works as I want with the PatternTokenizerFactory when I am using solr.TextField, but not when I am using the non-tokenized solr.StrField. But according to reading, facets performance is better on non-tokenized fields. We need better performance on our faceted searches on these multi-value fields. (25 million documents, three multi-valued facets) I would also need to have a filter that filter out identical values as the feeds have redundant data as shown above. Can anyone point point me in the right direction.. cheers, :-Dennis
EmbeddedSolr issues
Hi, Am getting the following messages while using EmbeddedSolr to retrieve the Term Vectors. I also happened to go through https://issues.apache.org/jira/browse/SOLR-914 . Should I ignore these messages and proceed or should I make any changes? [#|2011-01-27T11:56:34.593-0500|INFO|glassfish3.0.1|javax.enterprise.system.std.com.sun.enterprise.v3.services.impl|_ThreadId=33|_ThreadName=21687399 [Finalizer] Error org.apache.solr.core.CoreContainer - CoreContainer was not shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! [#|2011-01-27T11:56:34.609-0500|INFO|glassfish3.0.1|javax.enterprise.system.std.com.sun.enterprise.v3.services.impl|_ThreadId=33|_ThreadName=21687415 [Finalizer] Error org.apache.solr.core.SolrCore - Too many close [count:-1] on org.apache.solr.core.SolrCore@1638e30. Please report this exception to solr-user@lucene.apache.org [#|2011-01-27T11:56:34.611-0500|INFO|glassfish3.0.1|javax.enterprise.system.std.com.sun.enterprise.v3.services.impl|_ThreadId=33|_ThreadName=21687417 [Finalizer] Error org.apache.solr.core.SolrCore - REFCOUNT ERROR: unreferenced org.apache.solr.core.SolrCore@1638e30 (UserIndexCore) has a reference count of -1 [#|2011-01-27T11:56:34.613-0500|INFO|glassfish3.0.1|javax.enterprise.system.std.com.sun.enterprise.v3.services.impl|_ThreadId=33|_ThreadName=21687419 [Finalizer] Error org.apache.solr.common.util.ConcurrentLRUCache - ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! [#|2011-01-27T11:56:34.613-0500|INFO|glassfish3.0.1|javax.enterprise.system.std.com.sun.enterprise.v3.services.impl|_ThreadId=33|_ThreadName=21687420 [Finalizer] Error org.apache.solr.common.util.ConcurrentLRUCache - ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! This is the code am using: public static SolrCore USER_IDX_CORE; public static MapString, String getTermsVector(String userId) throws ParserConfigurationException, IOException, SAXException { MapString, String freqMap = new HashMapString, String(); try { SolrConfig USER_IDX_SOLR_CONFIG = new SolrConfig(USER_IDX_CONFIG_FILE); IndexSchema USER_IDX_SCHEMA = new IndexSchema(USER_IDX_SOLR_CONFIG, USER_IDX_SCHEMA_FILE, null); CoreContainer USER_IDX_CONTAINER = new CoreContainer(new SolrResourceLoader(SolrIndexer.USER_IDX_SOLR_HOME)); CoreDescriptor USER_IDX_CORE_DESCRIPTOR = new CoreDescriptor(USER_IDX_CONTAINER, USER_IDX_CORE_NAME, USER_IDX_SOLR_CONFIG.getResourceLoader() .getInstanceDir()); USER_IDX_CORE_DESCRIPTOR.setConfigName(USER_IDX_SOLR_CONFIG.getResourceName()); USER_IDX_CORE_DESCRIPTOR.setSchemaName(USER_IDX_SCHEMA.getResourceName()); USER_IDX_CORE = new SolrCore(null, USER_IDX_DATA_DIR, USER_IDX_SOLR_CONFIG, USER_IDX_SCHEMA, USER_IDX_CORE_DESCRIPTOR); USER_IDX_CONTAINER.register(USER_IDX_CORE_NAME, USER_IDX_CORE, false); SearchComponent tvComp = USER_IDX_CORE.getSearchComponent(tvComponent); ModifiableSolrParams params = new ModifiableSolrParams(); params.add(CommonParams.Q, FIELD_USER_ID + : + userId); params.add(CommonParams.QT, tvrh); params.add(TermVectorParams.TF, true); params.add(TermVectorComponent.COMPONENT_NAME, true); SolrRequestHandler handler = USER_IDX_CORE.getRequestHandler(tvrh); SolrQueryResponse rsp; rsp = new SolrQueryResponse(); rsp.add(responseHeader, new SimpleOrderedMap()); handler.handleRequest(new LocalSolrQueryRequest(USER_IDX_CORE, params), rsp); NamedList terms = (NamedList) ((NamedList) ((NamedList) rsp.getValues().get(TermVectorComponent.TERM_VECTORS)).getVal(0)).get(FIELD_USER_ALL); if (terms != null) { for (int i = 0; i terms.size(); i++) { NamedList freq = (NamedList) terms.getVal(i); freqMap.put(terms.getName(i), freq.getVal(0).toString()); } } } catch (Exception e) { e.printStackTrace(); } finally { try { USER_IDX_CORE.close(); USER_IDX_CONTAINER.shutdown(); } catch (Exception e) { e.printStackTrace(); } } return freqMap; } Also, USER_IDX_CONTAINER.shutdown(); throws a NullPointerException indicating the reference doesn't exist by the time the code execution reaches it. If I don't use thie snippet try { USER_IDX_CORE.close(); USER_IDX_CONTAINER.shutdown(); } catch (Exception e) { e.printStackTrace(); } I get a similar POSSIBLE RESORUCE LEAK!!! message that says SolrCore wasn't closed. Am calling this code via a message queue, and no concurrent
Is relevance score related to position of the term?
Let me describe the question using an example: If search Lee on name field as exact term match, returning result can be: Lee Jamie Jamie Lee Will solr grant higher score to Lee Jamie vs Jamie Lee based on the position of the term in name field of each document? From what i know, the score are related to: 1. term frequency 2. idf (inverse document frequency) 3. length norm 4. query norm It does not seem to care the position of the match term. Is it right? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-relevance-score-related-to-position-of-the-term-tp2363369p2363369.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is relevance score related to position of the term?
Hi Cyang, usually Solr isn't looking at the position of a term. However, there are solutions out there for considering the term's position when calculating a doc's score. Furthermore: If two docs got the same score, I think they are ordered the way they were found in the index. Does this answer your questions? Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Is-relevance-score-related-to-position-of-the-term-tp2363369p2363385.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Questions for MultiCore Setup
Hi, excuse me for pushing this for a second time, but I can't figure it out by looking at the source code... Thanks! Hi Lance, thanks for your explanation. As far as I know in distributed search i have to tell Solr what other shards it has to query. So, if I want to query a specific core, present in all my shards, i could tell Solr this by using the shards-param plus specified core on each shard. Using SolrCloud's distrib=true feature (it sets all the known shards automatically?), a collection should consist only of one type of core-schema, correct? How does SolrCloud knows that shard_x and shard_y are replicas of eachother (I took a look at the possibility to specify alternative shards if one is not available)? If it does not know that they are replicas of eachother, I should use the syntax of specifying alternative shards for failover due to performance-reasons, because querying 2 identical and available cores seems to be wasted capacity, no? Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2363396.html Sent from the Solr - User mailing list archive at Nabble.com.
disappearing MBeans
I am using JMX to monitor my replication status and am finding that my MBeans are disappearing. I turned on debugging for JMX and found that solr seems to be deleting the mbeans. Is this a bug? Some trace info is below.. here's me reading the mbean successfully: Jan 27, 2011 5:00:02 PM ServerCommunicatorAdmin reqIncoming FINER: Receive a new request. Jan 27, 2011 5:00:02 PM DefaultMBeanServerInterceptor getAttribute FINER: Attribute= indexReplicatedAt, obj= solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:00:02 PM Repository retrieve FINER: name=solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:00:02 PM ServerCommunicatorAdmin reqIncoming FINER: Finish a request. a little while later it removes the mbean from the PM Repository (whatever that is) and then re-adds it: FINER: Send create notification of object solr/myapp-core:id=org.apache.solr.handler.component.SearchHandler,type=atlas Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor sendNotification FINER: JMX.mbean.registered solr/myapp-core:type=atlas,id=org.apache.solr.handler.component.SearchHandler Jan 27, 2011 5:16:14 PM Repository contains FINER: name=solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:14 PM Repository retrieve FINER: name=solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:14 PM Repository remove FINER: name=solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor unregisterMBean FINER: Send delete notification of object solr/myapp-core:id=org.apache.solr.handler.ReplicationHandler,type=/replication Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor sendNotification FINER: JMX.mbean.unregistered solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor registerMBean FINER: ObjectName = solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:14 PM Repository addMBean FINER: name=solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor addObject FINER: Send create notification of object solr/myapp-core:id=org.apache.solr.handler.ReplicationHandler,type=/replication Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor sendNotification FINER: JMX.mbean.registered solr/myapp-core:type=/replication,id=org.apache.solr.handler.ReplicationHandler And after a tons of messages but still in the same second it does: Jan 27, 2011 5:16:14 PM Repository contains FINER: name=solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:14 PM Repository retrieve FINER: name=solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:14 PM Repository removeFINER: name=solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor unregisterMBean FINER: Send delete notification of object solr/myapp-core:id=org.apache.solr.handler.ReplicationHandler,type=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor sendNotification FINER: JMX.mbean.unregistered solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor registerMBean FINER: ObjectName = solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandlerJan 27, 2011 5:16:14 PM Repository addMBean FINER: name=solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor addObjectFINER: Send create notification of object solr/myapp-core:id=org.apache.solr.handler.ReplicationHandler,type=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:14 PM DefaultMBeanServerInterceptor sendNotificationFINER: JMX.mbean.registered solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler And then I don't know what this is about but it removes the bean again: Jan 27, 2011 5:16:15 PM Repository contains FINER: name=solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:15 PM Repository retrieve FINER: name=solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011 5:16:15 PM Repository remove FINER: name=solr/myapp-core:type=org.apache.solr.handler.ReplicationHandler,id=org.apache.solr.handler.ReplicationHandler Jan 27, 2011
Re: configure httpclient to access solr with user credential on third party host
thanks exaclty i asked my domain hosting provider and he provided me with some other port i am wondering can i specify credentials without the port i mean when i open the browser and i type www.mydomainmame/solr i get the tomcat auth login screen. in the same way can i configure the http client so that i dont have to specify the port Thanks darniz -- View this message in context: http://lucene.472066.n3.nabble.com/configure-httpclient-to-access-solr-with-user-credential-on-third-party-host-tp2360364p2364190.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DismaxParser Query
In general, patches are applied to the source tree and it's re-compiled. See: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches This is pretty easy, and I do know that some people have applied the eDismax patch to the 1.4 code line, but I haven't done it myself. Best Erick On Thu, Jan 27, 2011 at 10:27 AM, Jonathan Rochkind rochk...@jhu.eduwrote: Yes, I think nested queries are the only way to do that, and yes, nested queries like Daniel's example work (I've done it myself). I haven't really tried to get into understanding/demonstrating _exactly_ how the relevance ends up working on the overall master query in such a situation, but it sort of works. (Just note that Daniel's example isn't quite right, I think you need double quotes for the nested _query_, just check the wiki page/blog post on nested queries). Does eDismax handle parens for order of operation too? If so, eDismax is probably the best/easiest solution, especially if you're trying to parse an incoming query from some OTHER format and translate it to something that can be sent to Solr, which is what I often do. I haven't messed with eDismax myself yet. Does anyone know if there's any easy (easy!) way to get eDismax in a Solr 1.4? Any easy way to compile an eDismax query parser on it's own that works with Solr 1.4, and then just drop it into your local lib/ for use with an existing Solr 1.4? Jonathan From: Daniel Pötzinger [daniel.poetzin...@aoemedia.de] Sent: Thursday, January 27, 2011 9:26 AM To: solr-user@lucene.apache.org Subject: AW: DismaxParser Query It may also be an option to mix the query parsers? Something like this (not tested): q={!lucene}field1:test OR field2:test2 _query_:{!dismax qf=fields}+my dismax -bad So you have the benefits of lucene and dismax parser -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Donnerstag, 27. Januar 2011 15:15 An: solr-user@lucene.apache.org Betreff: Re: DismaxParser Query What version of Solr are you using, and could you consider either 3x or applying a patch to 1.4.1? Because eDismax (extended dismax) handles the full Lucene query language and probably works here. See the Solr JIRA 1553 at https://issues.apache.org/jira/browse/SOLR-1553 Best Erick On Thu, Jan 27, 2011 at 8:32 AM, Isan Fulia isan.fu...@germinait.com wrote: It worked by making mm=0 (it acted as OR operator) but how to handle this field1:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field2:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field3:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) On 27 January 2011 17:06, lee carroll lee.a.carr...@googlemail.com wrote: sorry ignore that - we are on dismax here - look at mm param in the docs you can set this to achieve what you need On 27 January 2011 11:34, lee carroll lee.a.carr...@googlemail.com wrote: the default operation can be set in your config to be or or on the query something like q.op=OR On 27 January 2011 11:26, Isan Fulia isan.fu...@germinait.com wrote: but q=keyword1 keyword2 does AND operation not OR On 27 January 2011 16:22, lee carroll lee.a.carr...@googlemail.com wrote: use dismax q for first three fields and a filter query for the 4th and 5th fields so q=keyword1 keyword 2 qf = field1,feild2,field3 pf = field1,feild2,field3 mm=something sensible for you defType=dismax fq= field4:(keyword3 OR keyword4) AND field5:(keyword5) take a look at the dismax docs for extra params On 27 January 2011 08:52, Isan Fulia isan.fu...@germinait.com wrote: Hi all, The query for standard request handler is as follows field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND field5:(keyword5) How the same above query can be written for dismax request handler -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: Is relevance score related to position of the term?
Hi Em, Thanks for reply. Basically you are saying there is no builtin solution that care about the position of the term to impact the relevancy score. In my scenario, i will get those two document with the same score. The order depends on the sequence of indexing. Thanks, Cyang -- View this message in context: http://lucene.472066.n3.nabble.com/Is-relevance-score-related-to-position-of-the-term-tp2363369p2364427.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is relevance score related to position of the term?
Just a little clarification, when i say position of the term, i mean the position of the term within the field. For example, Jamie Lee -- Lee is the second position of the name field. Lee Jamie -- Lee is the first position of the name field in this case. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-relevance-score-related-to-position-of-the-term-tp2363369p2364431.html Sent from the Solr - User mailing list archive at Nabble.com.
Searching for negative numbers very slow
If I do qt=dismax fq=uid:1 (or any other positive number) then queries are as quick as normal - in the 20ms range. However, any of fq=uid:\-1 or fq=uid:[* TO -1] or fq=uid:[-1 to -1] or fq=-uid:[0 TO *] then queries are incredibly slow - in the 9 *second* range. Anything I can do to mitigate this? Negative numbers have significant meaning in our system so it wouldn't be trivial to shift all uids up by the number of negative ids. Thanks, Simon
Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene
On Tue, Jan 25, 2011 at 01:28:16PM +0100, Markus Jelsma said: Are you sure you need CMS incremental mode? It's only adviced when running on a machine with one or two processors. If you have more you should consider disabling the incremental flags. I'll test agin but we added those to get better performance - not much but there did seem to be an improvement. The problem seems to not be in average use but that occasionally there's huge spike in load (there doesn't seem to be a particular killer query) and Solr just never recovers. Thanks, Simon
Re: Import Handler for tokenizing facet string into multi-valued solr.StrField..
: Subject: Import Handler for tokenizing facet string into multi-valued : solr.StrField.. : In-Reply-To: 1296123345064-2361292.p...@n3.nabble.com : References: 1296123345064-2361292.p...@n3.nabble.com -Hoss
Re: DIH clean=false
: Then for clean=false, my understanding is that it won't blow off existing : index. For data that exist in index and db table (by the same uniqueKey) : it will update the index data regardless if there is actual field update. : For existing index data but not existing in table (by comparing uniqueKey), if clean=false, the documents from your DB are indexed -- if you have a uniqueKey field, then docs with the same uniqueKey as an existing doc wil overwrite the existing doc. but nothing will be deleted (so documents you removed from your DB will still live on in your index) clean=true is just another way of saying delete all docs from the index before doing this import -Hoss
Solr for noSQL
Hi, Do we have data import handler to fast read in data from noSQL database, specifically, MongoDB I am thinking to use? Or a more general question, how does Solr work with noSQL database? Thanks. Jianbin
Re: Searching for negative numbers very slow
On Thu, Jan 27, 2011 at 11:32:26PM +, me said: If I do qt=dismax fq=uid:1 (or any other positive number) then queries are as quick as normal - in the 20ms range. For what it's worth uid is a TrieIntField with precisionStep=0, omitNorms=true, positionIncrementGap=0
Re: configure httpclient to access solr with user credential on third party host
This should help HttpClient client = new HttpClient(); client.getParams().setAuthenticationPreemptive(true); AuthScope scope = new AuthScope(AuthScope.ANY_HOST,AuthScope.ANY_PORT); client.getState().setCredentials(scope, new UsernamePasswordCredentials(user, password)); Regards, Jayendra On Thu, Jan 27, 2011 at 4:47 PM, Darniz rnizamud...@edmunds.com wrote: thanks exaclty i asked my domain hosting provider and he provided me with some other port i am wondering can i specify credentials without the port i mean when i open the browser and i type www.mydomainmame/solr i get the tomcat auth login screen. in the same way can i configure the http client so that i dont have to specify the port Thanks darniz -- View this message in context: http://lucene.472066.n3.nabble.com/configure-httpclient-to-access-solr-with-user-credential-on-third-party-host-tp2360364p2364190.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tika config in ExtractingRequestHandler
The tika.config file is obsolete. I don't know what replaces it. On 1/27/11, Erlend Garåsen e.f.gara...@usit.uio.no wrote: If this configuration file is the same as the tika-mimetypes.xml file inside Nutch' conf file, I have an example. I was trying to implement language detection for Solr and thought I had to invoke some Tika functionality by this configuration file in order to do so, but found out that I could rewrite some of the ExtractingRequestHandler classes instead. Erlend On 27.01.11 16.12, Adam Estrada wrote: I believe that as along as Tika is included in a folder that is referenced by solrconfig.xml you should be good. Solr will automatically throw mime types to Tika for parsing. Can anyone else add to this? Thanks, Adam On Thu, Jan 27, 2011 at 5:06 AM, Erlend Garåsene.f.gara...@usit.uio.no wrote: The wiki page for the ExtractingRequestHandler says that I can add the following configuration: str name=tika.config/my/path/to/tika.config/str I have tried to google for an example of such a Tika config file, but haven't found anything. Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Lance Norskog goks...@gmail.com
Re: Solr for noSQL
There no special connectors available to read from the key-value stores like memcache/cassandra/mongodb. You would have to get a Java client library for the DB and code your own dataimporthandler datasource. I cannot recommend this; you should make your own program to read data and upload to Solr with one of the Solr client libraries. Lance On 1/27/11, Jianbin Dai j...@huawei.com wrote: Hi, Do we have data import handler to fast read in data from noSQL database, specifically, MongoDB I am thinking to use? Or a more general question, how does Solr work with noSQL database? Thanks. Jianbin -- Lance Norskog goks...@gmail.com
Re: SolrCloud Questions for MultiCore Setup
Hello- I have not used SolrCloud. On 1/27/11, Em mailformailingli...@yahoo.de wrote: Hi, excuse me for pushing this for a second time, but I can't figure it out by looking at the source code... Thanks! Hi Lance, thanks for your explanation. As far as I know in distributed search i have to tell Solr what other shards it has to query. So, if I want to query a specific core, present in all my shards, i could tell Solr this by using the shards-param plus specified core on each shard. Using SolrCloud's distrib=true feature (it sets all the known shards automatically?), a collection should consist only of one type of core-schema, correct? How does SolrCloud knows that shard_x and shard_y are replicas of eachother (I took a look at the possibility to specify alternative shards if one is not available)? If it does not know that they are replicas of eachother, I should use the syntax of specifying alternative shards for failover due to performance-reasons, because querying 2 identical and available cores seems to be wasted capacity, no? Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2363396.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: DismaxParser Query
Hi all, I am currently using solr1.4.1 .Do I need to apply patch for extended dismax parser. On 28 January 2011 03:42, Erick Erickson erickerick...@gmail.com wrote: In general, patches are applied to the source tree and it's re-compiled. See: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches This is pretty easy, and I do know that some people have applied the eDismax patch to the 1.4 code line, but I haven't done it myself. Best Erick On Thu, Jan 27, 2011 at 10:27 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Yes, I think nested queries are the only way to do that, and yes, nested queries like Daniel's example work (I've done it myself). I haven't really tried to get into understanding/demonstrating _exactly_ how the relevance ends up working on the overall master query in such a situation, but it sort of works. (Just note that Daniel's example isn't quite right, I think you need double quotes for the nested _query_, just check the wiki page/blog post on nested queries). Does eDismax handle parens for order of operation too? If so, eDismax is probably the best/easiest solution, especially if you're trying to parse an incoming query from some OTHER format and translate it to something that can be sent to Solr, which is what I often do. I haven't messed with eDismax myself yet. Does anyone know if there's any easy (easy!) way to get eDismax in a Solr 1.4? Any easy way to compile an eDismax query parser on it's own that works with Solr 1.4, and then just drop it into your local lib/ for use with an existing Solr 1.4? Jonathan From: Daniel Pötzinger [daniel.poetzin...@aoemedia.de] Sent: Thursday, January 27, 2011 9:26 AM To: solr-user@lucene.apache.org Subject: AW: DismaxParser Query It may also be an option to mix the query parsers? Something like this (not tested): q={!lucene}field1:test OR field2:test2 _query_:{!dismax qf=fields}+my dismax -bad So you have the benefits of lucene and dismax parser -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Donnerstag, 27. Januar 2011 15:15 An: solr-user@lucene.apache.org Betreff: Re: DismaxParser Query What version of Solr are you using, and could you consider either 3x or applying a patch to 1.4.1? Because eDismax (extended dismax) handles the full Lucene query language and probably works here. See the Solr JIRA 1553 at https://issues.apache.org/jira/browse/SOLR-1553 Best Erick On Thu, Jan 27, 2011 at 8:32 AM, Isan Fulia isan.fu...@germinait.com wrote: It worked by making mm=0 (it acted as OR operator) but how to handle this field1:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field2:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) OR field3:((keyword1 AND keyword2) OR (keyword3 AND keyword4)) On 27 January 2011 17:06, lee carroll lee.a.carr...@googlemail.com wrote: sorry ignore that - we are on dismax here - look at mm param in the docs you can set this to achieve what you need On 27 January 2011 11:34, lee carroll lee.a.carr...@googlemail.com wrote: the default operation can be set in your config to be or or on the query something like q.op=OR On 27 January 2011 11:26, Isan Fulia isan.fu...@germinait.com wrote: but q=keyword1 keyword2 does AND operation not OR On 27 January 2011 16:22, lee carroll lee.a.carr...@googlemail.com wrote: use dismax q for first three fields and a filter query for the 4th and 5th fields so q=keyword1 keyword 2 qf = field1,feild2,field3 pf = field1,feild2,field3 mm=something sensible for you defType=dismax fq= field4:(keyword3 OR keyword4) AND field5:(keyword5) take a look at the dismax docs for extra params On 27 January 2011 08:52, Isan Fulia isan.fu...@germinait.com wrote: Hi all, The query for standard request handler is as follows field1:(keyword1 OR keyword2) OR field2:(keyword1 OR keyword2) OR field3:(keyword1 OR keyword2) AND field4:(keyword3 OR keyword4) AND field5:(keyword5) How the same above query can be written for dismax request handler -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: Solr for noSQL
Why not make one's own DIH handler, Lance? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org Sent: Thu, January 27, 2011 9:33:25 PM Subject: Re: Solr for noSQL There no special connectors available to read from the key-value stores like memcache/cassandra/mongodb. You would have to get a Java client library for the DB and code your own dataimporthandler datasource. I cannot recommend this; you should make your own program to read data and upload to Solr with one of the Solr client libraries. Lance On 1/27/11, Jianbin Dai j...@huawei.com wrote: Hi, Do we have data import handler to fast read in data from noSQL database, specifically, MongoDB I am thinking to use? Or a more general question, how does Solr work with noSQL database? Thanks. Jianbin -- Lance Norskog goks...@gmail.com
Re: Solr for noSQL
Do we have performance measurement? Would it be much slower compared to other DIH? There no special connectors available to read from the key-value stores like memcache/cassandra/mongodb. You would have to get a Java client library for the DB and code your own dataimporthandler datasource. I cannot recommend this; you should make your own program to read data and upload to Solr with one of the Solr client libraries. Lance On 1/27/11, Jianbin Dai j...@huawei.com wrote: Hi, Do we have data import handler to fast read in data from noSQL database, specifically, MongoDB I am thinking to use? Or a more general question, how does Solr work with noSQL database? Thanks. Jianbin -- Lance Norskog goks...@gmail.com
NOT operator not working
i have a field in xml file DeviceTypeAccessory Data / Memory/DeviceType solr schema field declared as field name=deviceType type=text indexed=true stored=true / I am trying to eliminate results by using NOT. For example I want all devices for a term except where DeviceType is not Accessory* SO here is what i m trying /solr/select?indent=onversion=2.2q=(sharp+AND+-deviceType:Access*)qt=dismaxwt=standard But for some reason its giving me all results for sharp irrespective of what devicetype is It works fine with fq=-deviceType:Accessory but due to some other application constraint we want to use q=(sharp+AND+-deviceType:Access*) Any thoughts what i m doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/NOT-operator-not-working-tp2365831p2365831.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: NOT operator not working
--- On Fri, 1/28/11, abhayd ajdabhol...@hotmail.com wrote: From: abhayd ajdabhol...@hotmail.com Subject: NOT operator not working To: solr-user@lucene.apache.org Date: Friday, January 28, 2011, 8:45 AM i have a field in xml file DeviceTypeAccessory Data / Memory/DeviceType solr schema field declared as field name=deviceType type=text indexed=true stored=true / I am trying to eliminate results by using NOT. For example I want all devices for a term except where DeviceType is not Accessory* SO here is what i m trying /solr/select?indent=onversion=2.2q=(sharp+AND+-deviceType:Access*)qt=dismaxwt=standard But for some reason its giving me all results for sharp irrespective of what devicetype is It works fine with fq=-deviceType:Accessory but due to some other application constraint we want to use q=(sharp+AND+-deviceType:Access*) Wildcard queries are not analyzed. For example if you have an lowercase filter at index time, you should lowercase your query manually. Instead of fq=-deviceType:Access* you should use fq=-deviceType:access*
Re: Solr for noSQL
On Fri, Jan 28, 2011 at 6:00 AM, Jianbin Dai j...@huawei.com wrote: [...] Do we have data import handler to fast read in data from noSQL database, specifically, MongoDB I am thinking to use? [...] Have you tried the links that a Google search turns up? Some of them look like pretty good prospects. Regards, Gora
Re: Is relevance score related to position of the term?
Hi, no, you missunderstood me, I only said that Solr does not care of the positions *usually*. Lucene got SpanNearQuery which considers the position of the Query's terms relative to eachother. Furthermore there exists a SpanFirstQuery which boosts occurences of a Term at the beginning of a special field. Unfortunately I am unaware whether they are already utilized as a Solr-Feature or not. Perhaps you will need to write your own QueryParserPlugin to make usage of them for your usecase. However, Plugins like DisMax do not care whether the found Term is at the beginning of the field or not. BUT you can specify a slop between terms of phrase-queries for boosting. Have a look at the Wiki's DisMax-page. Regards cyang2010 wrote: Just a little clarification, when i say position of the term, i mean the position of the term within the field. For example, Jamie Lee -- Lee is the second position of the name field. Lee Jamie -- Lee is the first position of the name field in this case. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-relevance-score-related-to-position-of-the-term-tp2363369p2365863.html Sent from the Solr - User mailing list archive at Nabble.com.