Re: Short DismaxRequestHandler Question
Okay, let me be more specific: I got a custom StopWordFilter and a WordMarkingFilter. The WordMarkingFilter is an easy implementation to determine which type a word is. The StopWordFilter (my implementation) removes specific types of words *and* all markers from all words. This leads to a deletion of some parts of sentences. In my disMaxQuery I specified some fields with such filters and some without. a) what docs should *not* match the query you listed In this case: docs where only Solr OR development occur should not match. It is not important, if both words occur in different fields. b) what queries should *not* match the doc you listed Actually Solr Development Lucidworks should not match, for example (assuming that lucidworks does not occur in a field like content). In this case, the user searches for development-work with Solr in relation to LucidWorks. Solr does not know about the relation, however with the 100%mm-definition, I can tell Solr something like this in a more easier way. c) what types of URLs you've already tried Those I have shown here. No more. Let me be sure, that I have understood your part about how the DisMaxRequestHandler works. If I got 4 fields: name, colour, category, manufacturer And an example-doc like this: title: iPhone colour: black category: smartphone manufacturer: apple And I got a dismax-query like this: q=apple iPhone qf=title^5 manufacturer mm=100% Than the whole thing will match (assumed that iPhone and /or apple where no stopwords)? If yes, than the problem is my filter-definition. There were some threads with discussions about such problems with the standard-stopWordFilter. Another example: title: Solr in a production environment cat: tutorial At index-time, title is reduced to: Solr production environment. A query like this using Solr in a production environment will be reduced to Solr production environment. This will work, as I have understood, because both: the indexed terms and the query are the same. However, if I got a content field, that indexes the content of the text without my markerFilter, this won't work, because the parsed query-strings are different??? I don't understand the problem example: title: Solr in a production environment cat: tutorial content: here is some text about using Solr in production. This fieldType consists of a lowerCaseFilter and a standard-StopWordFilter to delete all words like 'the, and, in' etc. Please, note that environment does not occur in the content-field. So a parsed querystring would look like: using Solr in a production environment - using Solr production environment (stopwords are removed). This won't match, because the word environment does not occur in the content-field? And according to that, the whole doc does not match? If you are confused about my examples and questions - I was trying to understand the explanations that were described here: http://lucene.472066.n3.nabble.com/DisMax-request-handler-doesn-t-work-with-stopwords-td478128.html#a478128 Thank you for help. - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/Short-DismaxRequestHandler-Question-tp775913p783063.html Sent from the Solr - User mailing list archive at Nabble.com.
Embedded Solr search query
Hello Solr community, When a user search on our web page, we need to run 3 related but different queries. For SEO reasons, we cannot use Ajax so at the moment we run 3 queries sequentially inside a PHP script. Allthough Solr is superfast, the extra network overhead can make the 3 queries 400ms slower than it needs to be. Thus my question is: Is there a way whereby you can send 1 query string to Solr with 2 or more embedded search queries, where Solr will split and execute the queries and return the results of the multiple searches in 1 go. In other words, instead of: - send searchQuery1 get result1 - send searchQuery2 get result2 ... you run: - send searchQuery1+searchQuery2 - get result1+result2 Thanks and Regards Eric
RE: Embedded Solr search query
Why not write a custom request handler which can parse, split, execute and combine results to your queries? From: Eric Grobler [via Lucene] [mailto:ml-node+783150-1027691461-124...@n3.nabble.com] Sent: Friday, May 07, 2010 1:01 AM To: caman Subject: Embedded Solr search query Hello Solr community, When a user search on our web page, we need to run 3 related but different queries. For SEO reasons, we cannot use Ajax so at the moment we run 3 queries sequentially inside a PHP script. Allthough Solr is superfast, the extra network overhead can make the 3 queries 400ms slower than it needs to be. Thus my question is: Is there a way whereby you can send 1 query string to Solr with 2 or more embedded search queries, where Solr will split and execute the queries and return the results of the multiple searches in 1 go. In other words, instead of: - send searchQuery1 get result1 - send searchQuery2 get result2 ... you run: - send searchQuery1+searchQuery2 - get result1+result2 Thanks and Regards Eric _ View message @ http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315 0.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p783156.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Embedded Solr search query
Or send the queries in parallell from the PHP script (use CURL). Svein 2010/5/7 caman aboxfortheotherst...@gmail.com: Why not write a custom request handler which can parse, split, execute and combine results to your queries? From: Eric Grobler [via Lucene] [mailto:ml-node+783150-1027691461-124...@n3.nabble.com] Sent: Friday, May 07, 2010 1:01 AM To: caman Subject: Embedded Solr search query Hello Solr community, When a user search on our web page, we need to run 3 related but different queries. For SEO reasons, we cannot use Ajax so at the moment we run 3 queries sequentially inside a PHP script. Allthough Solr is superfast, the extra network overhead can make the 3 queries 400ms slower than it needs to be. Thus my question is: Is there a way whereby you can send 1 query string to Solr with 2 or more embedded search queries, where Solr will split and execute the queries and return the results of the multiple searches in 1 go. In other words, instead of: - send searchQuery1 get result1 - send searchQuery2 get result2 ... you run: - send searchQuery1+searchQuery2 - get result1+result2 Thanks and Regards Eric _ View message @ http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315 0.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p783156.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Embedded Solr search query
Hi Camen, I was hoping someone has done it already :-) I am also new to Solr/lucene, can you perhaps point me to a request handler example page? Thanks and Regards Eric On Fri, May 7, 2010 at 9:05 AM, caman aboxfortheotherst...@gmail.comwrote: Why not write a custom request handler which can parse, split, execute and combine results to your queries? From: Eric Grobler [via Lucene] [mailto:ml-node+783150-1027691461-124...@n3.nabble.comml-node%2b783150-1027691461-124...@n3.nabble.com ] Sent: Friday, May 07, 2010 1:01 AM To: caman Subject: Embedded Solr search query Hello Solr community, When a user search on our web page, we need to run 3 related but different queries. For SEO reasons, we cannot use Ajax so at the moment we run 3 queries sequentially inside a PHP script. Allthough Solr is superfast, the extra network overhead can make the 3 queries 400ms slower than it needs to be. Thus my question is: Is there a way whereby you can send 1 query string to Solr with 2 or more embedded search queries, where Solr will split and execute the queries and return the results of the multiple searches in 1 go. In other words, instead of: - send searchQuery1 get result1 - send searchQuery2 get result2 ... you run: - send searchQuery1+searchQuery2 - get result1+result2 Thanks and Regards Eric _ View message @ http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315 0.htmlhttp://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315%0A0.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.comml-node%2b472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p783156.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Embedded Solr search query
Hi Svein, Yes we thought of sending parallell queries, but you still have the extra network overhead. Regards Eric On Fri, May 7, 2010 at 9:11 AM, Svein Parnas sv...@trank.no wrote: Or send the queries in parallell from the PHP script (use CURL). Svein 2010/5/7 caman aboxfortheotherst...@gmail.com: Why not write a custom request handler which can parse, split, execute and combine results to your queries? From: Eric Grobler [via Lucene] [mailto:ml-node+783150-1027691461-124...@n3.nabble.comml-node%2b783150-1027691461-124...@n3.nabble.com ] Sent: Friday, May 07, 2010 1:01 AM To: caman Subject: Embedded Solr search query Hello Solr community, When a user search on our web page, we need to run 3 related but different queries. For SEO reasons, we cannot use Ajax so at the moment we run 3 queries sequentially inside a PHP script. Allthough Solr is superfast, the extra network overhead can make the 3 queries 400ms slower than it needs to be. Thus my question is: Is there a way whereby you can send 1 query string to Solr with 2 or more embedded search queries, where Solr will split and execute the queries and return the results of the multiple searches in 1 go. In other words, instead of: - send searchQuery1 get result1 - send searchQuery2 get result2 ... you run: - send searchQuery1+searchQuery2 - get result1+result2 Thanks and Regards Eric _ View message @ http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315 0.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.comml-node%2b472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p783156.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Short DismaxRequestHandler Question
Btw: This thread helps a lot to understand the difference between qf and pf :-) http://lucene.472066.n3.nabble.com/Dismax-query-phrases-td489994.html#a489995 -- View this message in context: http://lucene.472066.n3.nabble.com/Short-DismaxRequestHandler-Question-tp775913p783379.html Sent from the Solr - User mailing list archive at Nabble.com.
Long Lucene queries
Hi all, In my web-app, i have to fire a query thats too long due to the various boosts I have to give. The size changes according to the query and many a times I get a blank page as I probably cross lucene's character limit. Is it possible to post it otherwise, to solr. Shall I be using POST instead of a GET here? Any other better suggestion? Regards, Pooja
Re: Long Lucene queries
On May 7, 2010, at 6:56 AM, Pooja Verlani wrote: In my web-app, i have to fire a query thats too long due to the various boosts I have to give. The size changes according to the query and many a times I get a blank page as I probably cross lucene's character limit. Is it possible to post it otherwise, to solr. Shall I be using POST instead of a GET here? Any other better suggestion? A few options: * Use POST (except you won't see the params in the log files) * Tomcat: http://wiki.apache.org/solr/SolrTomcat#Enabling_Longer_Query_Requests * Jetty: http://wiki.apache.org/solr/SolrJetty#Long_HTTP_GET_Query_URLs Or, possibly a lot of your query params can be put into solrconfig.xml, and you send over just what changed. You can do some tricks with param substitution to streamline this stuff in some cases. Some examples of what you're sending over would help us see where some improvements could be made. Erik
schema.xml question
Hello everyone, my question is Is it possible in schema.xml set a group of fields to use as a default field to query in OR or in AND ??? example: group name=group_name field name=a type=. / field name=b type=. / field name=c type=. / /group defaultSearchFieldgroup_name/defaultSearchField Thanks in advance
RE: schema.xml question
You could write your own requestHandler in solrconfig.xml, it'll allow you to predefine parameters for your configured search components. -Original message- From: Antonello Mangone antonello.mang...@gmail.com Sent: Fri 07-05-2010 15:17 To: solr-user@lucene.apache.org; Subject: schema.xml question Hello everyone, my question is Is it possible in schema.xml set a group of fields to use as a default field to query in OR or in AND ??? example: group name=group_name field name=a type=. / field name=b type=. / field name=c type=. / /group defaultSearchFieldgroup_name/defaultSearchField Thanks in advance
Re: schema.xml question
For the moment I don't know how to do it, but I'll follow your suggestion :) Thank you very much ... ps. I'm just a novel 2010/5/7 Markus Jelsma markus.jel...@buyways.nl You could write your own requestHandler in solrconfig.xml, it'll allow you to predefine parameters for your configured search components. -Original message- From: Antonello Mangone antonello.mang...@gmail.com Sent: Fri 07-05-2010 15:17 To: solr-user@lucene.apache.org; Subject: schema.xml question Hello everyone, my question is Is it possible in schema.xml set a group of fields to use as a default field to query in OR or in AND ??? example: group name=group_name field name=a type=. / field name=b type=. / field name=c type=. / /group defaultSearchFieldgroup_name/defaultSearchField Thanks in advance
RE: How to load Core Properties after Core creation?
What properties are you adding ? Do you have the persistence = true Ankit -Original Message- From: Ying Huang [mailto:yhu...@capitaliq.com] Sent: Thursday, May 06, 2010 6:33 PM To: solr-user@lucene.apache.org Subject: How to load Core Properties after Core creation? Hi All, Does anyone know if there is any way to create a new Core with specified properties or to alter and reload Core Properties for a Core without restarting the service? I tried to do this in three steps: 1) Create a new core; 2) Edit solr.xml directly to add properties into the core; 3) Call RELOAD handler to reload the new core and the specified properties. However, the reloading doesn't seem to work and the properties added don't apply for the newly created core. We're using a nightly build of Solr 1.4 with Lucene 2.9.1., and I'm using Core Properties in solr.xml for CoreAdmin. These properties can be used in solrconfig.xml (It's discussed here http://wiki.apache.org/solr/CoreAdmin#property). Is there any workaround for it? Thanks, Ying
Help indexing PDF files
Hi, I am new in Solr. I would like to index some PDF files. How can I do using example schema from 1.4.0 version? Regards, Leo
RE: Re: schema.xml question
A requestHandler works as an URL that can have predefined parameters. By default you will be querying the /select/ requestHandler. It, for instance, predefines the default number of rows to return (10) and returns all fields of a document (*). requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str !-- int name=rows10/int str name=fl*/str str name=version2.1/str -- /lst /requestHandler But you can also define more complex requestHandlers. The default configuration adds the dismax requestHandler (/dismax/) but it's actually the same as the default requestHandler if you would define all those configured parameters in your URL. So by defining the parameters in the solrconfig.xml, you won't need to pass them in you query. You can of course override predefined parameters with the exception of parameters defined inside an invariants block. Check the documentation [1] on this subject but i would suggest you study the shipped solrconfig.xml [2] configuration file, it offers better explanation of the subject. [1]: http://wiki.apache.org/solr/SolrConfigXml [2]: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Cheers, -Original message- From: Antonello Mangone antonello.mang...@gmail.com Sent: Fri 07-05-2010 15:26 To: solr-user@lucene.apache.org; Subject: Re: schema.xml question For the moment I don't know how to do it, but I'll follow your suggestion :) Thank you very much ... ps. I'm just a novel 2010/5/7 Markus Jelsma markus.jel...@buyways.nl You could write your own requestHandler in solrconfig.xml, it'll allow you to predefine parameters for your configured search components. -Original message- From: Antonello Mangone antonello.mang...@gmail.com Sent: Fri 07-05-2010 15:17 To: solr-user@lucene.apache.org; Subject: schema.xml question Hello everyone, my question is Is it possible in schema.xml set a group of fields to use as a default field to query in OR or in AND ??? example: group name=group_name field name=a type=. / field name=b type=. / field name=c type=. / /group defaultSearchFieldgroup_name/defaultSearchField Thanks in advance
RE: Re: schema.xml question
I forgot, there is actually a proper wiki page on this subject: http://wiki.apache.org/solr/SolrRequestHandler -Original message- From: Antonello Mangone antonello.mang...@gmail.com Sent: Fri 07-05-2010 15:26 To: solr-user@lucene.apache.org; Subject: Re: schema.xml question For the moment I don't know how to do it, but I'll follow your suggestion :) Thank you very much ... ps. I'm just a novel 2010/5/7 Markus Jelsma markus.jel...@buyways.nl You could write your own requestHandler in solrconfig.xml, it'll allow you to predefine parameters for your configured search components. -Original message- From: Antonello Mangone antonello.mang...@gmail.com Sent: Fri 07-05-2010 15:17 To: solr-user@lucene.apache.org; Subject: schema.xml question Hello everyone, my question is Is it possible in schema.xml set a group of fields to use as a default field to query in OR or in AND ??? example: group name=group_name field name=a type=. / field name=b type=. / field name=c type=. / /group defaultSearchFieldgroup_name/defaultSearchField Thanks in advance
RE: Help indexing PDF files
Hi, The wiki page [1] on this subject will get you started. [1]: http://wiki.apache.org/solr/ExtractingRequestHandler Cheers -Original message- From: Leonardo Azize Martins laz...@gmail.com Sent: Fri 07-05-2010 15:37 To: solr-user@lucene.apache.org; Subject: Help indexing PDF files Hi, I am new in Solr. I would like to index some PDF files. How can I do using example schema from 1.4.0 version? Regards, Leo
Re: Help indexing PDF files
I am using this page, but in my downloaded version there is no site directory. Thanks 2010/5/7 Markus Jelsma markus.jel...@buyways.nl Hi, The wiki page [1] on this subject will get you started. [1]: http://wiki.apache.org/solr/ExtractingRequestHandler Cheers -Original message- From: Leonardo Azize Martins laz...@gmail.com Sent: Fri 07-05-2010 15:37 To: solr-user@lucene.apache.org; Subject: Help indexing PDF files Hi, I am new in Solr. I would like to index some PDF files. How can I do using example schema from 1.4.0 version? Regards, Leo
RE: Re: Help indexing PDF files
You don't need it, you can use any PDF file. -Original message- From: Leonardo Azize Martins laz...@gmail.com Sent: Fri 07-05-2010 15:45 To: solr-user@lucene.apache.org; Subject: Re: Help indexing PDF files I am using this page, but in my downloaded version there is no site directory. Thanks 2010/5/7 Markus Jelsma markus.jel...@buyways.nl Hi, The wiki page [1] on this subject will get you started. [1]: http://wiki.apache.org/solr/ExtractingRequestHandler Cheers -Original message- From: Leonardo Azize Martins laz...@gmail.com Sent: Fri 07-05-2010 15:37 To: solr-user@lucene.apache.org; Subject: Help indexing PDF files Hi, I am new in Solr. I would like to index some PDF files. How can I do using example schema from 1.4.0 version? Regards, Leo
Re: increase(change) relevancy
Hi Ramzesua, take a look at the example of the function query that influences relvancy by the popular-field of the example-directory. http://wiki.apache.org/solr/FunctionQuery#Using_FunctionQuery Kind regards - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/increase-change-relevancy-tp783497p783750.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Example of using stream.file to post a binary file to solr
: Sorry. That is what I meant. But, I put it wrongly. I have not been : able to find examples of using solrj, for this. did you look at the link i included? : To POST a raw stream using SolrJ you need to use the : ContentStreamUpdateRequest... : : http://wiki.apache.org/solr/ExtractingRequestHandler#Sending_documents_to_Solr -Hoss
Re: Example of using stream.file to post a binary file to solr
Yes, I did. But, I don't find a solrj example there. The example in the doc uses curl. - Sent from iPhone On 07-May-2010, at 8:12 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Sorry. That is what I meant. But, I put it wrongly. I have not been : able to find examples of using solrj, for this. did you look at the link i included? : To POST a raw stream using SolrJ you need to use the : ContentStreamUpdateRequest... : : http://wiki.apache.org/solr/ExtractingRequestHandler#Sending_documents_to_Solr -Hoss
RE: How to load Core Properties after Core creation?
Thanks for your reply, Ahkit. I'm adding properties like masterEnabled/slaveEnabled, pollInterval, autoCommitTime and etc., so that I can easily configure these properties respectively for each Core and use them in solrconfig.xml. I'm also using persistent = true, and that's exactly the reason that whenever I unload a Core, the Core will be wiped out from solr.xml. And when I re-create the Core after that, I need to re-add and reload the properties. Btw, I'm doing this because I want to switch Live and Backup Cores and manipulate Core Configurations at the same time. -Ying -Original Message- From: Ankit Bhatnagar [mailto:abhatna...@vantage.com] Sent: Friday, May 07, 2010 9:27 AM To: 'solr-user@lucene.apache.org' Subject: RE: How to load Core Properties after Core creation? What properties are you adding ? Do you have the persistence = true Ankit -Original Message- From: Ying Huang [mailto:yhu...@capitaliq.com] Sent: Thursday, May 06, 2010 6:33 PM To: solr-user@lucene.apache.org Subject: How to load Core Properties after Core creation? Hi All, Does anyone know if there is any way to create a new Core with specified properties or to alter and reload Core Properties for a Core without restarting the service? I tried to do this in three steps: 1) Create a new core; 2) Edit solr.xml directly to add properties into the core; 3) Call RELOAD handler to reload the new core and the specified properties. However, the reloading doesn't seem to work and the properties added don't apply for the newly created core. We're using a nightly build of Solr 1.4 with Lucene 2.9.1., and I'm using Core Properties in solr.xml for CoreAdmin. These properties can be used in solrconfig.xml (It's discussed here http://wiki.apache.org/solr/CoreAdmin#property). Is there any workaround for it? Thanks, Ying
Re: Example of using stream.file to post a binary file to solr
Sandhya, Chris's link (with anchor name) directly goes to solrj example On Fri, May 7, 2010 at 8:15 PM, Sandhya Agarwal sagar...@opentext.comwrote: Yes, I did. But, I don't find a solrj example there. The example in the doc uses curl. - Sent from iPhone On 07-May-2010, at 8:12 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Sorry. That is what I meant. But, I put it wrongly. I have not been : able to find examples of using solrj, for this. did you look at the link i included? : To POST a raw stream using SolrJ you need to use the : ContentStreamUpdateRequest... : : http://wiki.apache.org/solr/ExtractingRequestHandler#Sending_documents_to_Solr -Hoss
RE: Help indexing PDF files
Take a look at Tika library From: Leonardo Azize Martins [via Lucene] [mailto:ml-node+783677-325080270-124...@n3.nabble.com] Sent: Friday, May 07, 2010 6:37 AM To: caman Subject: Help indexing PDF files Hi, I am new in Solr. I would like to index some PDF files. How can I do using example schema from 1.4.0 version? Regards, Leo _ View message @ http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p783677.h tml To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p784092.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Embedded Solr search query
I would just look at SOLR source code and see how standard search handler and dismaxSearchHandler are implemented. Look under package 'org.apache.solr. http://hudson.zones.apache.org/hudson/job/Solr-trunk/clover/org/apache/solr /handler/pkg-summary.html handler' From: Eric Grobler [via Lucene] [mailto:ml-node+783212-2036924225-124...@n3.nabble.com] Sent: Friday, May 07, 2010 1:33 AM To: caman Subject: Re: Embedded Solr search query Hi Camen, I was hoping someone has done it already :-) I am also new to Solr/lucene, can you perhaps point me to a request handler example page? Thanks and Regards Eric On Fri, May 7, 2010 at 9:05 AM, caman [hidden email]wrote: Why not write a custom request handler which can parse, split, execute and combine results to your queries? From: Eric Grobler [via Lucene] [mailto:[hidden email][hidden email] ] Sent: Friday, May 07, 2010 1:01 AM To: caman Subject: Embedded Solr search query Hello Solr community, When a user search on our web page, we need to run 3 related but different queries. For SEO reasons, we cannot use Ajax so at the moment we run 3 queries sequentially inside a PHP script. Allthough Solr is superfast, the extra network overhead can make the 3 queries 400ms slower than it needs to be. Thus my question is: Is there a way whereby you can send 1 query string to Solr with 2 or more embedded search queries, where Solr will split and execute the queries and return the results of the multiple searches in 1 go. In other words, instead of: - send searchQuery1 get result1 - send searchQuery2 get result2 ... you run: - send searchQuery1+searchQuery2 - get result1+result2 Thanks and Regards Eric _ View message @ http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315 0.htmlhttp://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp78315 0p78315%0A0.html To start a new topic under Solr - User, email [hidden email][hidden email] To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78315 6.html Sent from the Solr - User mailing list archive at Nabble.com. _ View message @ http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p78321 2.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Embedded-Solr-search-query-tp783150p784098.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help indexing PDF files
I had Solr in machine A. In machine B I run the command below: curl http://10.33.19.201:8983/solr/update/extract?extractOnly=true; --data-binary @VPSX_V1_R10.pdf and I get the response: java.lang.IllegalStateException: Form too large What I and doing wrong? Is it the right or best way to send PDF files to be indexed? Regards, Leo 2010/5/7 caman aboxfortheotherst...@gmail.com Take a look at Tika library From: Leonardo Azize Martins [via Lucene] [mailto:ml-node+783677-325080270-124...@n3.nabble.comml-node%2b783677-325080270-124...@n3.nabble.com ] Sent: Friday, May 07, 2010 6:37 AM To: caman Subject: Help indexing PDF files Hi, I am new in Solr. I would like to index some PDF files. How can I do using example schema from 1.4.0 version? Regards, Leo _ View message @ http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p783677.h tml To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.comml-node%2b472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p784092.html Sent from the Solr - User mailing list archive at Nabble.com.
CommonsHttpSolrServer vs EmbeddedSolrServer
Can someone please explain to me the use cases when one would use one over the other. All I got from the wiki was: (In reference to Embedded) If you need to use solr in an embedded application, this is the recommended approach. It allows you to work with the same interface whether or not you have access to HTTP. I had a use case (detailed here: http://lucene.472066.n3.nabble.com/Custom-DIH-variables-td777696.html#a777696) where I tried creating a new server via the current core but I kept getting a SEVERE: java.util.concurrent.RejectedExecutionException... SEVERE: Too many close [count:-3] on org.apache.solr.core.SolrCore Maybe my implementation was off??? Is there any detailed documentation on SolrJ usage.. more than the wiki? Any books? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp784201p784201.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sanity check on numeric types and which of them to use
3) The only reason to use a sint field is for backward compatibility and/or to use sortMissingFirst/SortMissingLast, correct? I'm using sint so I can facet and sort facets numerically. -- View this message in context: http://lucene.472066.n3.nabble.com/Sanity-check-on-numeric-types-and-which-of-them-to-use-tp473893p784295.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help indexing PDF files
Hi, Sorry, I am newbie. Using these two commands it works. curl http://10.33.19.201:8983/solr/update/extract?stream.file=C:\\temp\\VPSX_V1_R10.pdfstream.contentType=application/pdfliteral.id=M4968\\C$\\temp\\VPSX_V1_R10.pdfcommit=true curl ' http://10.33.19.201:8983/solr/update/extract?literal.id=doc1commit=true' -F te...@vpsx_v1_r10.pdf Thanks for all help. Going ahead, what is the best choice to index a windows share? Using stream.file or not? Indexing all files all times or verifying if a file was changes and if so, index it? Regards, Leo 2010/5/7 Leonardo Azize Martins laz...@gmail.com I had Solr in machine A. In machine B I run the command below: curl http://10.33.19.201:8983/solr/update/extract?extractOnly=true; --data-binary @VPSX_V1_R10.pdf and I get the response: java.lang.IllegalStateException: Form too large What I and doing wrong? Is it the right or best way to send PDF files to be indexed? Regards, Leo 2010/5/7 caman aboxfortheotherst...@gmail.com Take a look at Tika library From: Leonardo Azize Martins [via Lucene] [mailto:ml-node+783677-325080270-124...@n3.nabble.comml-node%2b783677-325080270-124...@n3.nabble.com ] Sent: Friday, May 07, 2010 6:37 AM To: caman Subject: Help indexing PDF files Hi, I am new in Solr. I would like to index some PDF files. How can I do using example schema from 1.4.0 version? Regards, Leo _ View message @ http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p783677.h tmlhttp://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p783677.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.comml-node%2b472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p784092.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can I use per field analyzers and dynamic fields?
: : The source of my problems is the fact that I do not know in advance the : field names. Users are allowed to decide they own field names, they can, : at runtime, add new fields and different Lucene documents might have : different field names. I would suggest you abstract away the field names your users pick and the underlying fieldnames you use when dealing with solr -- so create the list of fieldTypes you want to support (with all of the individual analzyer configurations that are valid) and then create a dynamicField corrisponding to each one. then if your user tells you they want an author field associated with the type text_en you can map that in your application to author_text_end at both indexing and query time. This will also let you map the same logical field names (from your user's perspective) to different internal field names (from Solr's perspective) based on usage -- searching the author field might be against author_text_en but sorting on author might use author_string. (Some notes were drafted up a while back on making this kind of field name aliasing a feature of Solr, but nothing ever came of it... http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams ) -Hoss
Re: Re: schema.xml question
Thank you very much for your suggestions, I'll study immediatly ... 2010/5/7 Markus Jelsma markus.jel...@buyways.nl I forgot, there is actually a proper wiki page on this subject: http://wiki.apache.org/solr/SolrRequestHandler -Original message- From: Antonello Mangone antonello.mang...@gmail.com Sent: Fri 07-05-2010 15:26 To: solr-user@lucene.apache.org; Subject: Re: schema.xml question For the moment I don't know how to do it, but I'll follow your suggestion :) Thank you very much ... ps. I'm just a novel 2010/5/7 Markus Jelsma markus.jel...@buyways.nl You could write your own requestHandler in solrconfig.xml, it'll allow you to predefine parameters for your configured search components. -Original message- From: Antonello Mangone antonello.mang...@gmail.com Sent: Fri 07-05-2010 15:17 To: solr-user@lucene.apache.org; Subject: schema.xml question Hello everyone, my question is Is it possible in schema.xml set a group of fields to use as a default field to query in OR or in AND ??? example: group name=group_name field name=a type=. / field name=b type=. / field name=c type=. / /group defaultSearchFieldgroup_name/defaultSearchField Thanks in advance
Re: schema.xml question
: group name=group_name : field name=a type=. / : field name=b type=. / : field name=c type=. / : /group : : defaultSearchFieldgroup_name/defaultSearchField at first glance, it seems like what you want is to use copyField... field name=a ... / field name=b ... / field name=c ... / field name=group_name ... / ... copyField src=a dest=group_name / copyField src=b dest=group_name / copyField src=c dest=group_name / defaultSearchFieldgroup_name/defaultSearchField -Hoss
Re: schema.xml question
It's seems like a copyField but is a group that I want ... and in your version is not a group, I want the possibility to search in a group of field using AND or OR 2010/5/7 Chris Hostetter hossman_luc...@fucit.org : group name=group_name : field name=a type=. / : field name=b type=. / : field name=c type=. / : /group : : defaultSearchFieldgroup_name/defaultSearchField at first glance, it seems like what you want is to use copyField... field name=a ... / field name=b ... / field name=c ... / field name=group_name ... / ... copyField src=a dest=group_name / copyField src=b dest=group_name / copyField src=c dest=group_name / defaultSearchFieldgroup_name/defaultSearchField -Hoss
Re: Short DismaxRequestHandler Question
: The StopWordFilter (my implementation) removes specific types of words *and* : all markers from all words. : : This leads to a deletion of some parts of sentences. Ah, yes i think you're running into the same confusion people have with dismax and stopwords -- there was a blog about this recently that explained it much better then i've ever been able to... http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/ As long as each of those solr fields is configured for stopwords (and the same) stopwords, everything Just Works the way you’d expect. But if one of those fields does not have stopwords configured, then (depending on your mm settings), you can easily end up getting zero hits for any (non-phrase) query clause that is a stopword. This kind of makes sense when you think about it — since at least one field didn’t have stopwords, there was a clause included for that stopword you entered. (the blog post makes an incorrect assumption after that -- but the paragraph above is dead on) : Let me be sure, that I have understood your part about how the : DisMaxRequestHandler works. : If I got 4 fields: : name, colour, category, manufacturer : : And an example-doc like this: : title: iPhone : colour: black : category: smartphone : manufacturer: apple : : And I got a dismax-query like this: : q=apple iPhone qf=title^5 manufacturer mm=100% : Than the whole thing will match (assumed that iPhone and /or apple where no : stopwords)? correct : Another example: : title: Solr in a production environment : cat: tutorial : : At index-time, title is reduced to: Solr production environment. : A query like this using Solr in a production environment : will be reduced to Solr production environment. ...not neccessarily. if you only have one field in your qf, and that feild defines using, in and a as stopwords then that may be what your query turns into. : However, if I got a content field, that indexes the content of the text : without my markerFilter, this won't work, because the parsed query-strings : are different??? I don't understand the problem (FWIW: parsed query-strings is an ambiguious statement (it could be refering to the Query object you get when parsing query strings, or it could refer to the toString value of the Query object you get after parsing) The query string is not parsed differnet for each of your qf fields, it is parsed exactly once, and each chunk of the string (ie: a word or quoted phrase) is passed to the analyzer for each field -- if any one of those fields produces a valid stream of tokens for that input (ie: it's not a stopword) then that constitutes one clause -- even if only one field says it's a valid clause, it's still a valid clause, and it's factored in to the min-should-match (mm) amount. Mike Klass explained this really well in a previous thread about stop words and dismax, where he showed the detailed query structure http://old.nabble.com/Re%3A-DisMax-request-handler-doesn%27t-work-with-stopwords--p11016770.html ...hopefully that structure will help make the behavior yo uare seeing clear. I suggest you add debugQuery=true to your queries that are failing, and look closely at the parsedQuery_toString -- pay attention to the structure, and not how many clauses exsit for the main boolean query -- note the clauses of that query, and where you have clauses consisting exclusively of stopwords (in fields where stopwords are not removed). If it's still not making sense please post that exact debug output. -Hoss
Re: Highlighting Performance On Large Documents
Do you have these options turned on when you index the text field: termVectors/termPositions/termOffsets ? Highlighting needs the information created by these anlysis options. If they are not turned on, Solr has load the document text and run the analyzer again with these options on, uses that data to create the highlighting, then throws away the reanalyzed data. Without these options, you are basically re-indexing the document when you highlight it. http://www.lucidimagination.com/search/out?u=http%3A%2F%2Fwiki.apache.org%2Fsolr%2FFieldOptionsByUseCase On Wed, May 5, 2010 at 5:01 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: (10/05/05 22:08), Serdar Sahin wrote: Hi, Currently, there are similar topics active in the mailing list, but it I did not want to steal the topic. I have currently indexed 100.000 documents, they are microsoft office/pdf etc documents I convert them to TXT files before indexing. Files are between 1-500 pages. When I search something and filter it to retrieve documents that has more than 100 pages, and activate highlighting, it takes 0.8-3 seconds, depending on the query. (10 result per page) If I retrieve documents that has 1-5 pages, it drops to 0.1 seconds. If I disable highlighting, it drops to 0.1-0.2 seconds, even on the large documents, which is more than enough. This problem mostly happens where there are no caches, on the first query. I use this configuration for highlighting: $query-addHighlightField('description')-addHighlightField('plainText'); $query-setHighlightSimplePre('strong'); $query-setHighlightSimplePost('/strong'); $query-setHighlightHighlightMultiTerm(TRUE); $query-setHighlightMaxAnalyzedChars(1); $query-setHighlightSnippets(2); Do you have any suggestions to improve response time while highlighting is active? I have read couple of articles you have previously provided but they did not help. And for the second question, I retrieve these fields: $query-addField('title')-addField('cat')-addField('thumbs_up')- addField('thumbs_down')-addField('lang')-addField('id')- addField('username')-addField('view_count')-addField('pages')- addField('no_img')-addField('date'); If I can't solve the highlighting problem on large documents, I can simply disable it and retrieve first x characters from the plainText (full text) field, but is it possible to retrieve first x characters without using the highlighting feature? When I use this; $query-setHighlight(TRUE); $query-setHighlightAlternateField('plainText'); $query-setHighlightMaxAnalyzedChars(0); $query-setHighlightMaxAlternateFieldLength(256); It still takes 2 seconds if I retrieve 10 rows that has 200-300 pages. The highlighting still works so it might be the source of the problem, I want to completely disable it and retrieve only the first 256 characters of the plainText field. Is it possible? It may remove some overhead give better performance. I personally prefer the highlighting solution but I also would like to hear the solution for this problem. For the same query, if I disable highlighting and without retrieving (but still searching) the plainText field, it drops to 0.0094 seconds. So I think if I can get the first 256 characters without using the highlighting, I will get better performance. Any suggestions regarding with these two problems will highly appreciated. Thanks, Serdar Sahin Hi Serdar, There are a few things I think of you can try. 1. Provide another field for highlighting and use copyField to copy plainText to the highlighting field. When using copyField, specify maxChars attribute to limit the length of the copy of plainText. This should work on Solr 1.4. 2. If you can use branch_3x version of Solr, try FastVectorHighlighter. Koji -- http://www.rondhuit.com/en/ -- Lance Norskog goks...@gmail.com
Re: caching repeated OR'd terms
I would suggest benchmarking this before doing any more complex design. A field with only 10k unique integer or string values will search very very quickly. On Thu, May 6, 2010 at 7:54 AM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: Hey everyone, I'm having some difficulty figuring out the best way to optimize for a certain query situation. My documents have a many-valued field that stores lists of IDs. All in all there are probably about 10,000 distinct IDs throughout my index. I need to be able to query and find all documents that contain a given set of IDs. Ie, I want to find all documents that contain IDs 3, 202, 3030 or 505. Currently I'm implementing this like so: q= (myfield:3) OR (myfield:202) OR (myfield:3030) OR (myfield:505). It's possible that there could be upwards of hundreds of terms, although 90% of the time it will be under 10. Ideally I would like to do this with a filter query, but I have read that it is impossible to cache OR'd terms in a fq, though this feature may come soon. The problem is that the combinations of OR'd terms will almost always be unique, so the query cache will have a very low hit rate. It would be great if the individual terms could be cached individually, but I'm not sure how to accomplish that. Any suggestions would be welcome! -Kallin Nagelberg -- Lance Norskog goks...@gmail.com
Re: Custom DIH variables
Using a core via the Embedded front and the HTTP front end seems dangerous. SOLR-1499 does an HTTP call for the same info. https://issues.apache.org/jira/browse/SOLR-1499 On Thu, May 6, 2010 at 8:18 PM, Blargy zman...@hotmail.com wrote: So I came up with the following class. public class LatestTimestampEvaluator extends Evaluator { private static final Logger logger = Logger.getLogger(LatestTimestampEvaluator.class.getName()); �...@override public String evaluate(String expression, Context context) { List params = EvaluatorBag.parseParams(expression, context.getVariableResolver()); String field = params.get(0).toString(); SolrCore core = context.getSolrCore(); CoreContainer container = new CoreContainer(); container.register(core, false); EmbeddedSolrServer server = new EmbeddedSolrServer(container, core.getName()); SolrQuery query = new SolrQuery(*:*); query.addSortField(field, SolrQuery.ORDER.desc); query.setRows(1); try { QueryResponse response = null; response = server.query(query); SolrDocument document = response.getResults().get(0); Date date = (Date) document.getFirstValue(field); String timestamp = new Timestamp(date.getTime()).toString(); logger.info(timestamp); return timestamp; } catch (Exception exception) { logger.severe(exception.getMessage()); logger.severe(DocumentUtils.stackTraceToString(exception)); return null; } finally { core.close(); container.shutdown(); } } and I am calling it within my dataconfig file like so... dataConfig function name=latest_timestamp class=com.mycompany.solr.handler.dataimport.LatestTimestampEvaluator/ ... entity name=item ... deltaQuery=select id from items where updated_on '${dataimporter.functions.latest_timestamp('updated_on')}' /entity /datConfig I was hoping someone can 1) Comment on the above class. How does it suck? This was my first time working with SolrJ. 2) It seems to work find when I there is only one entity using that function but when there are more entities using that function (which is my use case) I get a SEVERE: java.util.concurrent.RejectedExecutionException exception. Can someone explain why this is happening and how I can fix it. I added the full stack trace to a separate thread here: http://lucene.472066.n3.nabble.com/SEVERE-java-util-concurrent-RejectedExecutionException-tp782768p782768.html Thanks for your help! -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p782769.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Custom DIH variables
Thanks for the tip Lance. Just for reference, why is it dangerous to use the HTTP method? I realized that the embedded method is probably not the way to go (obviously since I was getting that SEVERE: java.util.concurrent.RejectedExecutionException) -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p785161.html Sent from the Solr - User mailing list archive at Nabble.com.