Re: 'Advertising' a site
OK, no problem. It's about 4-8 months out. I'm just excited by the idea of finally going public. I'm not a professional DB admin, web designer, Search Engine Analyst, Chief Technical Officer, or Backend programmer by education, only self study and about 1/2 of a Bachelors AND 1/2 of a Masters is CS. But I've studied and taken on about 1/2 of those. It's all for something I WANT out there, and no one seems to have built it. So I will . . . and my team :-) I'd like to get feedback when it's out there and learn from what people point out in their reaction to our implementation. I've already learned a lot here, and so has the main SE guy in our group. I/we owe a LOOOT to: PHP community Symfony community Doctrine community Dezign for Databases Apache Community Eclipse community Postgres Community Ubuntu and it's community A couple of different library writers .. let's see, who's left Oh yeah! You guys here at Solr/Lucene Thanks all of you guys :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Mon, 10/18/10, Chris Hostetter hossman_luc...@fucit.org wrote: From: Chris Hostetter hossman_luc...@fucit.org Subject: Re: 'Advertising' a site To: solr-user@lucene.apache.org Date: Monday, October 18, 2010, 10:23 PM : There is a PoweredBy page on the Wiki that's good for that. Even better is a post to the list telling folks about your usee case, index size, hardware, etc A lot of new users find that information really helpful for comparison. -Hoss
Re: count(*) equivilent in Solr/Lucene
I/my team will have to look at that and decode it,LOL! I get some of it. The database version returns 1 row, with the answer. What does this return and how fast is it on BIG indexes? PS, that should have been: . . . date_column2 :end_date; . Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Mon, 10/18/10, Chris Hostetter hossman_luc...@fucit.org wrote: From: Chris Hostetter hossman_luc...@fucit.org Subject: Re: count(*) equivilent in Solr/Lucene To: solr-user@lucene.apache.org Date: Monday, October 18, 2010, 10:26 PM : : SELECT : COUNT(*) : WHERE : date_column1 :start_date AND : date_column2 :end_date; q=*:*fq=column1:[start TO *]fq=column2:[end TO *]rows=0 ...every result includes a total count. -Hoss
RE: DIH delta-import question
According to the DIH wiki, delta-import is only supported by sql (http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_comman d-1) Ephraim Ofir -Original Message- From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] Sent: Friday, October 15, 2010 8:20 AM To: solr-user@lucene.apache.org Subject: DIH delta-import question Dear list, I'm trying to delta-import with datasource FileDataSource and processor FileListEntityProcessor. I want to load only files which are newer than dataimport.properties - last_index_time. It looks like that newerThan=${dataimport.last_index_time} is without any function. Can it be that newerThan is configured under FileListEntityProcessor but used for the next following entity processor and not for FileListEntityProcessor itself? This is in my case the XPathEntityProcessor which doesn't support newerThan. Version is solr 4.0 from trunk. Regards, Bernd
RE: DIH - configure password in 1 place and store it in encrypted form?
You could include a common file with the JdbcDataSource (http://wiki.apache.org/solr/SolrConfigXml#XInclude) or add the password as a property in solr.xml in the container scope (http://wiki.apache.org/solr/CoreAdmin#Configuration) so it will be available to all cores. Personally, I use a single configuration for all cores with soft-linked config files, so I only have to change the config in one place. Ephraim Ofir -Original Message- From: Gora Mohanty [mailto:g...@mimirtech.com] Sent: Sunday, October 17, 2010 7:05 PM To: solr-user@lucene.apache.org Subject: Re: DIH - configure password in 1 place and store it in encrypted form? On Sun, Oct 17, 2010 at 7:02 PM, Arunkumar Ayyavu arunkumar.ayy...@gmail.com wrote: Hi! I have multiple cores reading from the same database and I've provided the user credentials in all data-config.xml files. Is there a way to tell JdbcDataSource in data-config.xml to read the username and password from a file? This would help me not to change the username/password in multiple data-config.xml files. And is it possible to store the password in encrypted and let the DIH to call the decrypter to read the password? [...] As far as I am aware, it is not possible to do either of the two options above. However, one could extend the JdbcDataSource class to add such functionality. Regards, Gora
Re: Removing Common Web Page Header and Footer from All Content Fetched by Nutch
Unfortunately, Nutch still uses Tika 0.7 in 1.2 and trunk. Nutch needs to be upgraded to Tika 0.8 (when it's released or just the current trunk). Also, the Boilerpipe API needs to be exposed through Nutch configuration, which extractor can be used, which parameters need to be set etc. Upgrading to Tika's trunk might be relatively easy but exposing Boilerpipe surely isn't. On Tuesday, October 19, 2010 06:47:43 am Otis Gospodnetic wrote: Hi Israel, You can use this: http://search-lucene.com/?q=boilerpipefc_project=Tika Not sure if it's built into Nutch, though... Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Israel Ekpo israele...@gmail.com To: solr-user@lucene.apache.org; u...@nutch.apache.org Sent: Mon, October 18, 2010 9:01:50 PM Subject: Removing Common Web Page Header and Footer from All Content Fetched by Nutch Hi All, I am indexing a web application with approximately 9500 distinct URL and contents using Nutch and Solr. I use Nutch to fetch the urls, links and the crawl the entire web application to extract all the content for all pages. Then I run the solrindex command to send the content to Solr. The problem that I have now is that the first 1000 or so characters of some pages and the last 400 characters of the pages are showing up in the search results. These are contents of the common header and footer used in the site respectively. The only work around that I have now is to index everything and then go through each document one at a time to remove the first 1000 characters if the levenshtein distance between the first 1000 characters of the page and the common header is less than a certain value. Same applies to the footer content common to all pages. Is there a way to ignore certain stop phrase so to speak in the Nutch configuration based on levenshtein distance or jaro winkler distance so that certain parts of the fetched data that matches this stop phrases will not be parsed? Any useful pointers would be highly appreciated. Thanks in advance. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350
Re: Removing Common Web Page Header and Footer from All Content Fetched by Nutch
Thanks Otis and Markus for your input. I will check it out today. On Tue, Oct 19, 2010 at 4:45 AM, Markus Jelsma markus.jel...@openindex.iowrote: Unfortunately, Nutch still uses Tika 0.7 in 1.2 and trunk. Nutch needs to be upgraded to Tika 0.8 (when it's released or just the current trunk). Also, the Boilerpipe API needs to be exposed through Nutch configuration, which extractor can be used, which parameters need to be set etc. Upgrading to Tika's trunk might be relatively easy but exposing Boilerpipe surely isn't. On Tuesday, October 19, 2010 06:47:43 am Otis Gospodnetic wrote: Hi Israel, You can use this: http://search-lucene.com/?q=boilerpipefc_project=Tika Not sure if it's built into Nutch, though... Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Israel Ekpo israele...@gmail.com To: solr-user@lucene.apache.org; u...@nutch.apache.org Sent: Mon, October 18, 2010 9:01:50 PM Subject: Removing Common Web Page Header and Footer from All Content Fetched by Nutch Hi All, I am indexing a web application with approximately 9500 distinct URL and contents using Nutch and Solr. I use Nutch to fetch the urls, links and the crawl the entire web application to extract all the content for all pages. Then I run the solrindex command to send the content to Solr. The problem that I have now is that the first 1000 or so characters of some pages and the last 400 characters of the pages are showing up in the search results. These are contents of the common header and footer used in the site respectively. The only work around that I have now is to index everything and then go through each document one at a time to remove the first 1000 characters if the levenshtein distance between the first 1000 characters of the page and the common header is less than a certain value. Same applies to the footer content common to all pages. Is there a way to ignore certain stop phrase so to speak in the Nutch configuration based on levenshtein distance or jaro winkler distance so that certain parts of the fetched data that matches this stop phrases will not be parsed? Any useful pointers would be highly appreciated. Thanks in advance. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350 -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Uppercase and lowercase queries
I want to query on cityname. This works when I query for example: Boston But when I query boston it didnt show any results. In the database is stored: Boston. So I thought: I should change the filter on this field to make everything lowercase. The field definition for city is: field name=city type=string indexed=true stored=true/ So I changed its fieldtype string from: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true TO: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType But it still doesnt show any results when I query boston...why? -- View this message in context: http://lucene.472066.n3.nabble.com/Uppercase-and-lowercase-queries-tp1731349p1731349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Uppercase and lowercase queries
Use text field. On Tue, Oct 19, 2010 at 3:19 AM, PeterKerk vettepa...@hotmail.com wrote: I want to query on cityname. This works when I query for example: Boston But when I query boston it didnt show any results. In the database is stored: Boston. So I thought: I should change the filter on this field to make everything lowercase. The field definition for city is: field name=city type=string indexed=true stored=true/ So I changed its fieldtype string from: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true TO: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType But it still doesnt show any results when I query boston...why? -- View this message in context: http://lucene.472066.n3.nabble.com/Uppercase-and-lowercase-queries-tp1731349p1731349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Uppercase and lowercase queries
Because you need to reindex. On Tuesday, October 19, 2010 12:19:53 pm PeterKerk wrote: I want to query on cityname. This works when I query for example: Boston But when I query boston it didnt show any results. In the database is stored: Boston. So I thought: I should change the filter on this field to make everything lowercase. The field definition for city is: field name=city type=string indexed=true stored=true/ So I changed its fieldtype string from: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true TO: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType But it still doesnt show any results when I query boston...why? -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350
Re: Uppercase and lowercase queries
Yes, and reindex. And i suggest not to use `string` as the name of the fieldType as it will confuse later. field name=city type=text_lowercase indexed=true stored=true/ fieldType name=text_lowercase class=solr.TextField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType On Tuesday, October 19, 2010 12:25:53 pm Pradeep Singh wrote: Use text field. On Tue, Oct 19, 2010 at 3:19 AM, PeterKerk vettepa...@hotmail.com wrote: I want to query on cityname. This works when I query for example: Boston But when I query boston it didnt show any results. In the database is stored: Boston. So I thought: I should change the filter on this field to make everything lowercase. The field definition for city is: field name=city type=string indexed=true stored=true/ So I changed its fieldtype string from: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true TO: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType But it still doesnt show any results when I query boston...why? -- View this message in context: http://lucene.472066.n3.nabble.com/Uppercase-and-lowercase-queries-tp1731 349p1731349.html Sent from the Solr - User mailing list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350
Re: Uppercase and lowercase queries
I now used textfield...and it works, so thanks! :) -- View this message in context: http://lucene.472066.n3.nabble.com/Uppercase-and-lowercase-queries-tp1731349p1731423.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Commits on service after shutdown
You never get full control of commits, as Solr will auto-commit anyway whenever the (configurable) input buffer is full. With the current architecture you cannot really trust adds or commits to 100% certainly be successful, because the server may have been restarted between an add and commit() without you noticing etc. So your feeder app should expect failures to happen, including added docs to be committed, and be prepared to re-sumbit any documents needed after a failure. This can be achieved by querying the index at regular intervals to see if you are in sync. Or you could help implement SOLR-1924 to get a reliable callback mechanism :) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 18. okt. 2010, at 21.50, Ezequiel Calderara wrote: I understand, but i want to have control of what is commit or not. In our scenario, we want to add documents to the index, and maybe after an hour trigger the commit. If in the middle, we have a server shutdown or any process sending a Shutdown signal to the process. I don't want those documents being commited. Should i file a bug issue or an enhacement issue?. Thanks On Mon, Oct 18, 2010 at 3:54 PM, Israel Ekpo israele...@gmail.com wrote: The documents should be implicitly committed when the Lucene index is closed. When you perform a graceful shutdown, the Lucene index gets closed and the documents get committed implicitly. When the shutdown is abrupt as in a KILL -9, then this does not happen and the updates are lost. You can use the auto commit parameter when sending your updates so that the changes are saved right away, thought this could slow down the indexing speed considerably but I do not believe there are parameters to keep those un-commited documents alive after a kill. On Mon, Oct 18, 2010 at 2:46 PM, Ezequiel Calderara ezech...@gmail.com wrote: Hi, i'm new in the mailing list. I'm implementing Solr in my actual job, and i'm having some problems. I was testing the consistency of the commits. I found for example that if we add X documents to the index (without commiting) and then we restart the service, the documents are commited. They show up in the results. This is interpreted to me like an error. But when we add X documents to the index (without commiting) and then we kill the process and we start it again, the documents doesn't appear. This behaviour is the one i want. Is there any param to avoid the auto-committing of documents after a shutdown? Is there any param to keep those un-commited documents alive after a kill? Thanks! -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ http://www.ironicnet.com/ -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ -- __ Ezequiel. Http://www.ironicnet.com
Re: count(*) equivilent in Solr/Lucene
On Oct 19, 2010, at 2:09 AM, Dennis Gearon wrote: I/my team will have to look at that and decode it,LOL! I get some of it. The database version returns 1 row, with the answer. What does this return and how fast is it on BIG indexes? rows=0 returns 0 rows, but the total count will be returned. You can do rows=0 with any query to get the total number of matches. PS, that should have been: . . . date_column2 :end_date; . Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Mon, 10/18/10, Chris Hostetter hossman_luc...@fucit.org wrote: From: Chris Hostetter hossman_luc...@fucit.org Subject: Re: count(*) equivilent in Solr/Lucene To: solr-user@lucene.apache.org Date: Monday, October 18, 2010, 10:26 PM : : SELECT : COUNT(*) : WHERE : date_column1 :start_date AND : date_column2 :end_date; q=*:*fq=column1:[start TO *]fq=column2:[end TO *]rows=0 ...every result includes a total count. -Hoss -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search
boosting injection
Hi all, I have a client that is sending this query q=title:history AND author:joyce is it possible to transform at runtime this query in this way: q=title:history^10 AND author:joyce^5 ? Best regards, Andrea
Re: boosting injection
Andrea, Using the SOLR dismax query handler, you could set up queries like this to boost on fields of your choice. Basically, the q parameter would be the query terms (without the field definitions, and a qf (Query Fields) parameter that you use to define your boost(s): http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative would be to parse the query in whatever application is sending the queries to the SOLR instance to make the necessary transformations. Regards, Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini andrea.gazzar...@atcult.it wrote: Hi all, I have a client that is sending this query q=title:history AND author:joyce is it possible to transform at runtime this query in this way: q=title:history^10 AND author:joyce^5 ? Best regards, Andrea
Re: snapshot-4.0 and maven
Hey thanks Tommy. To be more specific, I'm trying to use SolrJ in a clojure project. When I try to use SolrJ using what you showed me, I get errors saying lucene classes can't be found etc.. Is there a way to build everything SolrJ (snapshot-4.0) needs into one jar? Matt On Mon, Oct 18, 2010 at 11:01 PM, Tommy Chheng tommy.chh...@gmail.com wrote: Once you built the solr 4.0 jar, you can use mvn's install command like this: mvn install:install-file -DgroupId=org.apache -DartifactId=solr -Dpackaging=jar -Dversion=4.0-SNAPSHOT -Dfile=solr-4.0-SNAPSHOT.jar -DgeneratePom=true @tommychheng On 10/18/10 7:28 PM, Matt Mitchell wrote: I'd like to get solr snapshot-4.0 pushed into my local maven repo. Is this possible to do? If so, could someone give me a tip or two on getting started? Thanks, Matt
Re: **SPAM** Re: boosting injection
Hi Ken, thanks for your response...unfortunately it doesn't solve my problem. I cannot chnage the client behaviour so the query must be a query and not only the query terms. In this scenario, It would be great, for example, if I could declare the boost in the schema field definitionbut I think it's not possible isn't it? Regards Andrea _ From: Ken Stanley [mailto:doh...@gmail.com] To: solr-user@lucene.apache.org Sent: Tue, 19 Oct 2010 15:05:31 +0200 Subject: **SPAM** Re: boosting injection Andrea, Using the SOLR dismax query handler, you could set up queries like this to boost on fields of your choice. Basically, the q parameter would be the query terms (without the field definitions, and a qf (Query Fields) parameter that you use to define your boost(s): http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative would be to parse the query in whatever application is sending the queries to the SOLR instance to make the necessary transformations. Regards, Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini andrea.gazzar...@atcult.it wrote: Hi all, I have a client that is sending this query q=title:history AND author:joyce is it possible to transform at runtime this query in this way: q=title:history^10 AND author:joyce^5 ? Best regards, Andrea
Re: boosting injection
Hi Ken, thanks for your response...unfortunately it doesn't solve my problem. I cannot chnage the client behaviour so the query must be a query and not only the query terms. In this scenario, It would be great, for example, if I could declare the boost in the schema field definitionbut I think it's not possible isn't it? Regards Andrea _ From: Ken Stanley [mailto:doh...@gmail.com] To: solr-user@lucene.apache.org Sent: Tue, 19 Oct 2010 15:05:31 +0200 Subject: **SPAM** Re: boosting injection Andrea, Using the SOLR dismax query handler, you could set up queries like this to boost on fields of your choice. Basically, the q parameter would be the query terms (without the field definitions, and a qf (Query Fields) parameter that you use to define your boost(s): http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative would be to parse the query in whatever application is sending the queries to the SOLR instance to make the necessary transformations. Regards, Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini andrea.gazzar...@atcult.it wrote: Hi all, I have a client that is sending this query q=title:history AND author:joyce is it possible to transform at runtime this query in this way: q=title:history^10 AND author:joyce^5 ? Best regards, Andrea
Re: **SPAM** Re: boosting injection
Index-time boosting maybe? http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22 On Tuesday, October 19, 2010 04:23:46 pm Andrea Gazzarini wrote: Hi Ken, thanks for your response...unfortunately it doesn't solve my problem. I cannot chnage the client behaviour so the query must be a query and not only the query terms. In this scenario, It would be great, for example, if I could declare the boost in the schema field definitionbut I think it's not possible isn't it? Regards Andrea _ From: Ken Stanley [mailto:doh...@gmail.com] To: solr-user@lucene.apache.org Sent: Tue, 19 Oct 2010 15:05:31 +0200 Subject: **SPAM** Re: boosting injection Andrea, Using the SOLR dismax query handler, you could set up queries like this to boost on fields of your choice. Basically, the q parameter would be the query terms (without the field definitions, and a qf (Query Fields) parameter that you use to define your boost(s): http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative would be to parse the query in whatever application is sending the queries to the SOLR instance to make the necessary transformations. Regards, Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini andrea.gazzar...@atcult.it wrote: Hi all, I have a client that is sending this query q=title:history AND author:joyce is it possible to transform at runtime this query in this way: q=title:history^10 AND author:joyce^5 ? Best regards, Andrea -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350
Re: boosting injection
Y-E-A-H! I think it's so! Markus, what are disadvantages of this boosting strategy? Thanks a lot Andrea Il 19/10/2010 16:25, Markus Jelsma ha scritto: Index-time boosting maybe? http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22 On Tuesday, October 19, 2010 04:23:46 pm Andrea Gazzarini wrote: Hi Ken, thanks for your response...unfortunately it doesn't solve my problem. I cannot chnage the client behaviour so the query must be a query and not only the query terms. In this scenario, It would be great, for example, if I could declare the boost in the schema field definitionbut I think it's not possible isn't it? Regards Andrea _ From: Ken Stanley [mailto:doh...@gmail.com] To: solr-user@lucene.apache.org Sent: Tue, 19 Oct 2010 15:05:31 +0200 Subject: **SPAM** Re: boosting injection Andrea, Using the SOLR dismax query handler, you could set up queries like this to boost on fields of your choice. Basically, the q parameter would be the query terms (without the field definitions, and a qf (Query Fields) parameter that you use to define your boost(s): http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative would be to parse the query in whatever application is sending the queries to the SOLR instance to make the necessary transformations. Regards, Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini andrea.gazzar...@atcult.it wrote: Hi all, I have a client that is sending this query q=title:history AND author:joyce is it possible to transform at runtime this query in this way: q=title:history^10 AND author:joyce^5 ? Best regards, Andrea
Re: **SPAM** Re: boosting injection
Andrea, Another approach, aside of Markus' suggestion, would be to create your own handler that could intercept the query and perform whatever necessary transformations that you need at query time. However, that would require having Java knowledge (which I make no assumption). Regards, Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Tue, Oct 19, 2010 at 10:23 AM, Andrea Gazzarini andrea.gazzar...@atcult.it wrote: Hi Ken, thanks for your response...unfortunately it doesn't solve my problem. I cannot chnage the client behaviour so the query must be a query and not only the query terms. In this scenario, It would be great, for example, if I could declare the boost in the schema field definitionbut I think it's not possible isn't it? Regards Andrea -- *From:* Ken Stanley [mailto:doh...@gmail.com] *To:* solr-user@lucene.apache.org *Sent:* Tue, 19 Oct 2010 15:05:31 +0200 *Subject:* **SPAM** Re: boosting injection Andrea, Using the SOLR dismax query handler, you could set up queries like this to boost on fields of your choice. Basically, the q parameter would be the query terms (without the field definitions, and a qf (Query Fields) parameter that you use to define your boost(s): http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative would be to parse the query in whatever application is sending the queries to the SOLR instance to make the necessary transformations. Regards, Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini andrea.gazzar...@atcult.it wrote: Hi all, I have a client that is sending this query q=title:history AND author:joyce is it possible to transform at runtime this query in this way: q=title:history^10 AND author:joyce^5 ? Best regards, Andrea
Documents and cores
Hi all- I have a newbie design question about documents, especially with SQL databases. I am trying to set up Solr to go against a database that, for example, has items and people. The way I see it, and I don't know if this is right or not (thus the question), is that I see both as separate documents as an item may contain a list of parts, which the user may want to search, and, as part of the item, view the list of people who have ordered the item. Then there's the actual people, who the user might want to search to find a name and, consequently, what items they ordered. To me they are both top level things, with some overlap of fields. If I'm searching for people, I'm likely not going to be interested in the parts of the item, while if I'm searching for items the likelihood is that I may want to search for 42532 which is, in this instance, a SKU, and not get hits on the zip code section of the people. Does it make sense, then, to separate these two out as separate documents? I believe so because the documentation I've read suggests that a document should be analogous to a row in a table (in this case, very de-normalized). What is tripping me up is, as far as I can tell, you can have only one document type per index, and thus one document per core. So in this example, I have two cores, items and people. Is this correct? Should I embrace the idea of having many cores or am I supposed to have a single, unified index with all documents (which doesn't seem like Solr supports). The ultimate question comes down to the search interface. I don't necessarily want to have the user explicitly state which document they want to search; I'd like them to simply type 42532 and get documents from both cores, and then possibly allow for filtering results after the fact, not before. As I've only used the admin site so far (which is core-specific), does the client API allow for unified searching across all cores? Assuming it does, I'd think my idea of multiple-documents is okay, but I'd love to hear from people who actually know what they're doing. :) Thanks, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: boosting injection
Hi Ken, yes I'm a java developer so I think I should be able to do that but I was wondering if there's a way to solve my issue without coding. Problem is that I need to adjust this query in a short time and in addition I cannot justify (at this stage of the project) additional software artifacts. Anyway thanks for your support Best Regards, Andrea Il 19/10/2010 16:33, Ken Stanley ha scritto: Andrea, Another approach, aside of Markus' suggestion, would be to create your own handler that could intercept the query and perform whatever necessary transformations that you need at query time. However, that would require having Java knowledge (which I make no assumption). Regards, Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Tue, Oct 19, 2010 at 10:23 AM, Andrea Gazzarini andrea.gazzar...@atcult.it wrote: Hi Ken, thanks for your response...unfortunately it doesn't solve my problem. I cannot chnage the client behaviour so the query must be a query and not only the query terms. In this scenario, It would be great, for example, if I could declare the boost in the schema field definitionbut I think it's not possible isn't it? Regards Andrea -- *From:* Ken Stanley [mailto:doh...@gmail.com] *To:* solr-user@lucene.apache.org *Sent:* Tue, 19 Oct 2010 15:05:31 +0200 *Subject:* **SPAM** Re: boosting injection Andrea, Using the SOLR dismax query handler, you could set up queries like this to boost on fields of your choice. Basically, the q parameter would be the query terms (without the field definitions, and a qf (Query Fields) parameter that you use to define your boost(s): http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative would be to parse the query in whatever application is sending the queries to the SOLR instance to make the necessary transformations. Regards, Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini andrea.gazzar...@atcult.it wrote: Hi all, I have a client that is sending this query q=title:history AND author:joyce is it possible to transform at runtime this query in this way: q=title:history^10 AND author:joyce^5 ? Best regards, Andrea
Timeouts in distributed search using Solr + Zookeeper
Hi, we are looking at Solr+Zookeeper as the architecture for enabling federated searches among geographically distributed data centers. I wonder if anybody can comment on what is the status of enabling timeouts with respect to distributed searches in a Solr-Zookeeper environment. Specifially, following the example C) in the Solr Cloud wiki: http://wiki.apache.org/solr/SolrCloud it seems like the system is resilient to any Solr server (out of 4) being unavailable, but if both Solr servers serving the same shard go down, then a distributed query results in error, instead or returning partial results. Is there any special configuration that needs to be set for the Solr and/or Zookeepers servers, or any request parameter that needs to be added, to make the distributed query just return results from the only available shard ? Or maybe this feature is not yet operational ? thanks a lot, Luca
Negative filter using the appends element
I'm using Solr 1.4 with the standard request handler and attempting to apply a negative fq for all requests via the appends elements but its not being applied. Is this an intended limitation? I looked in JIRA for an existing issue but nothing jumped out. Works fine: lst name=appends str name=fqtag:test/str /lst Does not work: lst name=appends str name=fq-tag:test/str /lst
query results file for trec_eval
Hello! I am a student and I am trying to run evaluation for TREC format document. I have the judgments. I would like to have the output of my queries for use with trec_eval software. Can someone please point me how to make Solr spit out output in this format? Or at least point me to some material that guides me through this. Thanks, Valli
RE: query results file for trec_eval
If I understand your use case correctly. You will have to write your own response writer. Only the below response writers are available . Query response writer Description XMLResponseWriter The most general-purpose response format outputs its results in XML, as demonstrated by the blogging application in Part 1http://www.ibm.com/developerworks/java/library/j-solr1/. XSLTResponseWriter The XSLTResponseWriter applies a specified XSLT transformation to the output of the XMLResponseWriter. The tr parameter in the request specifies the name of the XSLT transformation to use. The transformation specified must exist in the Solr Home's conf/xslt directory. See Resourceshttp://www.ibm.com/developerworks/java/library/j-solr2/#resources to learn more about the XSLT Response Writer. JSONResponseWriter Outputs results in JavaScript Object Notation (JSON) format. JSON is a simple, human-readable, data-interchange format that is also easy for machines to parse. RubyResponseWriter The RubyResponseWriter extends the JSON format so that the results can safely be evaluated in Ruby. If you are interested in using Ruby with Solr, follow the links to acts_as_solrhttp://www.ibm.com/developerworks/java/library/j-solr2/#resources and Flarehttp://www.ibm.com/developerworks/java/library/j-solr2/#resources in Resourceshttp://www.ibm.com/developerworks/java/library/j-solr2/#resources. PythonResponseWriter Extends the JSON output format for safe use in the Python eval method. QueryResponseWriters are added to Solr in the solrconfig.xml file using the queryResponseWriter tag and affiliated attributes. The response type is specified in the request using the wt parameter. The default is standard, which is set in the solrconfig.xml to be the XMLResponseWriter. Finally, instances of the QueryResponseWriter must provide thread-safe implementations of the write() and getContentType() methods used to create responses. -Ankit From: Valli Indraganti [via Lucene] [mailto:ml-node+1732965-820449511-24...@n3.nabble.com] Sent: Tuesday, October 19, 2010 11:30 AM To: Ankit Bhatnagar Subject: query results file for trec_eval Hello! I am a student and I am trying to run evaluation for TREC format document. I have the judgments. I would like to have the output of my queries for use with trec_eval software. Can someone please point me how to make Solr spit out output in this format? Or at least point me to some material that guides me through this. Thanks, Valli View message @ http://lucene.472066.n3.nabble.com/query-results-file-for-trec-eval-tp1732965p1732965.html To start a new topic under Solr - User, email ml-node+472068-1740001710-24...@n3.nabble.com To unsubscribe from Solr - User, click herehttp://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsubscribe_by_codenode=472068code=YWJoYXRuYWdhckB2YW50YWdlLmNvbXw0NzIwNjh8MjA4ODk1Mzc0NA==. -- View this message in context: http://lucene.472066.n3.nabble.com/query-results-file-for-trec-eval-tp1732965p1732999.html Sent from the Solr - User mailing list archive at Nabble.com.
FW: Dismax phrase boosts on multi-value fields
-Original Message- From: Jason Brown Sent: Tue 19/10/2010 13:45 To: d...@lucene.apache.org Subject: Dismax phrase boosts on multi-value fields Hi - I have a multi-value field, so say for example it consists of 'my black cat' 'my white dog' 'my blue rabbit' The field is whitespace parsed when put into the index. I have a phrase query boost configured on this field which I understand kicks in when my search term is found entirely in this field. So, if the search term is 'my blue rabbit', then I understand that my phrase boost will be applied as this is found entirley in this field. My question/presumption is that as this is a multi-valued field, only 1 value of the multi-value needs to match for the phrase query boost (given my very imaginative set of test data :-) above, you can see that this obviously matches 1 value and not them all) Thanks for your help. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Re: query results file for trec_eval
I don't know anything about the TREC format document, but i think if you want text output, you can do it by using the http://wiki.apache.org/solr/XsltResponseWriter to transform the xml to a text... On Tue, Oct 19, 2010 at 12:29 PM, Valli Indraganti valli.indraga...@gmail.com wrote: Hello! I am a student and I am trying to run evaluation for TREC format document. I have the judgments. I would like to have the output of my queries for use with trec_eval software. Can someone please point me how to make Solr spit out output in this format? Or at least point me to some material that guides me through this. Thanks, Valli -- __ Ezequiel. Http://www.ironicnet.com
does solr support posting gzipped content?
Hi folks, I was wondering if there is any native support for posting gzipped files to solr? i.e. I'm testing a project where we inject our log files into solr for indexing, these logs files are gzipped, and I figure it would take less network bandwith to inject gzipped files directl. is there a way to do this? other then implementing my own SerlvetFilter or some such. thanx -- View this message in context: http://lucene.472066.n3.nabble.com/does-solr-support-posting-gzipped-content-tp1733178p1733178.html Sent from the Solr - User mailing list archive at Nabble.com.
Facet Use Case
Hi Guys, Let me describe you the use case for our search applications: a- The user enter to the search application to latest 20 document are displayed. b- A Tag cloud component is populate with the facet available from a. c- The user type something in the text box. d- The documents are tagged in some way (there is tags field). e- Get the tags of the first document returned by c) and build a facet result with documents containing the same tags. make sense? Is it possible to do this with a single Solr request? Thanks in advance. -- edgar
Dismax phrase boosts on multi-value fields
Hi - I have a multi-value field, so say for example it consists of 'my black cat' 'my white dog' 'my blue rabbit' The field is whitespace parsed when put into the index. I have a phrase query boost configured on this field which I understand kicks in when my search term is found entirely in this field. So, if the search term is 'my blue rabbit', then I understand that my phrase boost will be applied as this is found entirley in this field. My question/presumption is that as this is a multi-valued field, only 1 value of the multi-value needs to match for the phrase query boost (given my very imaginative set of test data :-) above, you can see that this obviously matches 1 value and not them all) Thanks for your help. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Re: Dismax phrase boosts on multi-value fields
You are correct. The query needs to match as a phrase. It doesn't need to match everything. Note that if a value is: long sentence with my blue rabbit in it, then query my blue rabbit will also match as a phrase, for phrase boosting or query purposes. Jonathan Jason Brown wrote: Hi - I have a multi-value field, so say for example it consists of 'my black cat' 'my white dog' 'my blue rabbit' The field is whitespace parsed when put into the index. I have a phrase query boost configured on this field which I understand kicks in when my search term is found entirely in this field. So, if the search term is 'my blue rabbit', then I understand that my phrase boost will be applied as this is found entirley in this field. My question/presumption is that as this is a multi-valued field, only 1 value of the multi-value needs to match for the phrase query boost (given my very imaginative set of test data :-) above, you can see that this obviously matches 1 value and not them all) Thanks for your help. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Re: I need to indexing the first character of a field in another field
Hi guys, I read all suggestions and I did some tests, and finally, the indexing process is working. I did the extraction of initial character of three fields. Here are the functions: function extraiInicial(valor) { if (valor != valor != null) { valor = valor.substring(0, 1).toUpperCase(); } else { valor = ''; } return valor; } function extraiIniciaisAutorEditoraSebo(linha) { linha.put(inicialautor, extraiInicial(linha.get(autor))); linha.put(inicialeditora, extraiInicial(linha.get(editora))); linha.put(inicialsebo, extraiInicial(linha.get(sebo))); return linha; } Thank you for your help, Renato F. Wesenauer 2010/10/18 Chris Hostetter hossman_luc...@fucit.org This exact topic was just discussed a few days ago... http://search.lucidimagination.com/search/document/7b6e2cc37bbb95c8/faceting_and_first_letter_of_fields#3059a28929451cb4 My comments on when/where it makes sense to put this logic... http://search.lucidimagination.com/search/document/7b6e2cc37bbb95c8/faceting_and_first_letter_of_fields#7b6e2cc37bbb95c8 : Date: Mon, 18 Oct 2010 19:31:28 -0200 : From: Renato Wesenauer renato.wesena...@gmail.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: I need to indexing the first character of a field in another field : : Hello guys, : : I need to indexing the first character of the field autor in another field : inicialautor. : Example: :autor = Mark Webber :inicialautor = M : : I did a javascript function in the dataimport, but the field inicialautor : indexing empty. : : The function: : : function InicialAutor(linha) { : var aut = linha.get(autor); : if (aut != null) { : if (aut.length 0) { : var ch = aut.charAt(0); : linha.put(inicialautor, ch); : } : else { : linha.put(inicialautor, ''); : } : } : else { : linha.put(inicialautor, ''); : } : return linha; : } : : What's wrong? : : Thank's, : : Renato Wesenauer : -Hoss
Re: Documents and cores
: Subject: Documents and cores : References: 4cbd939c.3020...@atcult.it : aanlktimp4n+fnrqwtqkagpx7bb8lkfwwp9moo3ojx...@mail.gmail.com : In-Reply-To: aanlktimp4n+fnrqwtqkagpx7bb8lkfwwp9moo3ojx...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss
Re: Upgrade to Solr 1.4, very slow at start up when loading all cores
: We will take this approach in our production environment but meanwhile I am : curious if this issue will be addressed: it seems the new/first searchers do : not really buy any performance benefits because it uses so much memory, : especially at core loading time. There's nothing inheriently wrong with using newSearcher/firstSearcher -- for many people they do in fact provide a perf improvement for real users (at the cost of some initial time spent warming before those users ever get access to the searcher) As i udnerstand it from this thread, your issue is not actually the firstSearcher/newSearcher -- your issue (per yonik's ocmments) is that with per Segment sorting in 1.4, the FieldCache for some of your fields requires a lot more ram in 1.4 then it would have been for Solr 1.3 -- which caused GC thrashing during initialization. Even w/o using firstSearcher/newSearcher, all that RAM is still going to be used if/when you do sort on those fields -- all removing the firstSearcher/newSearcher queries on those fields has done for you is delay when the time spent initializing those FieldCaches happens and when that RAM first starts getting used. It's posisbly you never actual sort on those fields, in which case removing those warming queries completely is definitely the way to go -- but if you do sort on them then the warming queries can still legitimately be helpful (in thta they pay the cost up front before a real user issues queries) As yonik mentioned the real fix for the amount of memory being used is to switch to the TrieDateFields which use much more efficient FieldCache's for sorting -- with that change you can probably start using the warming queries again. (Depending on how you tested, you may not have noticed much advantage to having them because you'll really only see the advantages on the initial queries that do sorting -- those should show huge outlier times w/o the warming queries, but once those poor unlucky users have paid the price for initializing hte FieldCache, every one elses sorts should be fast) -Hoss
Re: SolrJ new javabin format
: The CHANGES.txt file in branch_3x says that the javabin format has changed in : Solr 3.1, so you need to update SolrJ as well as Solr. Is the SolrJ included : in 3.1 compatible with both 3.1 and 1.4.1? If not, that's going to make a : graceful upgrade of my replicated distributed installation a little harder. The formats are not currently compatible. The first priority was to get the format fixed so it was using true UTF8 (instead of Java's bastardized modified UTF8) in a way that would generate a clear error if people attempted to use an older SolrJ to talk to a newer SOlr server (or vice versa). The concensus was that fixing thta problem was worth the added complexity during upgrading -- people that want to use SolrJ 1.4 to talk to a Solr 3.x server can always use the XML format instead of the binary format. If you'd like to help improve the codec so that 3.x can recognize when a 1.4 client connects and switch to the older format, patches along those lines would certianly be welcome. -Hoss
Documents and Cores, take 2
Hi all- I have a newbie design question about documents, especially with SQL databases. I am trying to set up Solr to go against a database that, for example, has items and people. The way I see it, and I don't know if this is right or not (thus the question), is that I see both as separate documents as an item may contain a list of parts, which the user may want to search, and, as part of the item, view the list of people who have ordered the item. Then there's the actual people, who the user might want to search to find a name and, consequently, what items they ordered. To me they are both top level things, with some overlap of fields. If I'm searching for people, I'm likely not going to be interested in the parts of the item, while if I'm searching for items the likelihood is that I may want to search for 42532 which is, in this instance, a SKU, and not get hits on the zip code section of the people. Does it make sense, then, to separate these two out as separate documents? I believe so because the documentation I've read suggests that a document should be analogous to a row in a table (in this case, very de-normalized). What is tripping me up is, as far as I can tell, you can have only one document type per index, and thus one document per core. So in this example, I have two cores, items and people. Is this correct? Should I embrace the idea of having many cores or am I supposed to have a single, unified index with all documents (which doesn't seem like Solr supports). The ultimate question comes down to the search interface. I don't necessarily want to have the user explicitly state which document they want to search; I'd like them to simply type 42532 and get documents from both cores, and then possibly allow for filtering results after the fact, not before. As I've only used the admin site so far (which is core-specific), does the client API allow for unified searching across all cores? Assuming it does, I'd think my idea of multiple-documents is okay, but I'd love to hear from people who actually know what they're doing. :) Thanks, Ron BTW: Sorry about the problem with the previous message; I didn't know about thread hijacking. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: How can i get collect stemmed query?
Oh you are constructing the string 'fly +body:away' in your StemFilter? Just to make sure, does this q=+body:(fly away) return your document? And analysis.jsp (at query time) displays 'fly +body:away' from the string 'flyaway'? I don't know why are you doing this but your stemfilter should return only terms, not field names attached to it. Maybe you can find this useful so that you can do what you want without writing custom code. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory --- On Tue, 10/19/10, Jerad ag...@naver.com wrote: From: Jerad ag...@naver.com Subject: Re: How can i get collect stemmed query? To: solr-user@lucene.apache.org Date: Tuesday, October 19, 2010, 5:10 AM Thanks for your reply :) 1. I tested that q=*:*fl=body , 1 doc returned as result as I expected. 2. I'm edit my scheme.xml as you instructed. analyzer type=query class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer //No filter description. /analyzer but no result returned. 3. I wonder that... Tipically Tokenizer and filter flow was 1) Input stream provide text stream to tokenizer or filter. 2) tokenizer or filter get a token, and processed token and offset attribute info has returned. 3) offset attributes has the infomation of token's. This is a part of tipical filter src that I thought. public class CustomStemFilter extends TokenFilter { private MyCustomStemer stemmer; private TermAttribute termAttr; private OffsetAttribute offsetAttr; private TypeAttribute typeAttr; private HashtableString,String reserved = new HashtableString,String(); public CustomStemFilter( TokenStream tokenStream, boolean isQuery, MyCustomStemer stemmer ){ super( tokenStream ); this.stemmer = stemmer; termAttr = (TermAttribute) addAttribute(TermAttribute.class); offsetAttr = (OffsetAttribute) addAttribute(OffsetAttribute.class); typeAttr = (TypeAttribute) addAttribute(TypeAttribute.class); addAttribute(PositionIncrementAttribute.class); //Some of my custom logic here. //do something. } private MyCustomStemmer stemmer = new MyCustomStemmer(); public boolean incrementToken() throws IOException { clearAttributes(); if (!input.incrementToken()) return false; StringBuffer queryBuffer = new StringBuffer(); //stemming logic here. //generated query string has append to queryBuffer. termAttr.setTermBuffer(queryBuffer.toString(), 0, queryBuffer.length()); offsetAttr.setOffset(0, queryBuffer.length()); offSet += queryBuffer.length(); typeAttr.setType(word); return true; } } ※ MyCustomStemmer analyze input string flyaway to query string : fly +body:away and return it. At index time, contents to be searched is normally analyzed and indexed as below. a) Contents to be indexed : fly away b) Token fly and length of fly = 3(Has been setup by offset attribute method) has returned by filter or analyzer. c) Next token away and length of away = 4 has returned. I think it's a general index flow. But, I customized MyCustomFilter that filter generate query string, not a token. In the process, offset value has changed : query's length, not a single token's length. I wonder that value to be set up by offsetAttr.setOffset() method has influence on search result on using solr? (I tested this on main page's query input box at http://localhost:8983/solr/admin/ ) -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-i-get-collect-search-result-from-custom-filtered-query-tp1723055p1729717.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Documents and Cores, take 2
Ron, In the past I've worked with SOLR for a product that required the ability to search - separately - for companies, people, business lists, and a combination of the previous three. In designing this in SOLR, I found that using a combination of explicit field definitions and dynamic fields ( http://wiki.apache.org/solr/SchemaXml#Dynamic_fields) gave me the best possible solution for the problem. In essence, I created explicit fields that would be shared among all document types: a unique id, a document type, an indexed date, a modified date, and maybe a couple of other fields that share traits with all document types (i.e., name, a market specific to our business, etc). The unique id was built as a string, and was prefixed with the document type, and it ended with the unique id from the database. The dynamic fields can be configured to be as flexible as you need, and in my experience I would strongly recommend documenting each type of dynamic field for each of your document types as a reference for your developers (and yourself). :) This allows us to build queries that can be focused on specific document types, or combining all of the types into a super search. For example, you could something to the effect of: (docType: people) AND (df_firstName:John AND df_lastName:Hancock), (docType:companies) AND (df_BusinessName:Acme+Inc), or even ((df_firstName:John AND df_lastName:Hancock) OR (df_BusinessName:Acme+Inc)). I hope this helps! - Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Tue, Oct 19, 2010 at 4:57 PM, Olson, Ron rol...@lbpc.com wrote: Hi all- I have a newbie design question about documents, especially with SQL databases. I am trying to set up Solr to go against a database that, for example, has items and people. The way I see it, and I don't know if this is right or not (thus the question), is that I see both as separate documents as an item may contain a list of parts, which the user may want to search, and, as part of the item, view the list of people who have ordered the item. Then there's the actual people, who the user might want to search to find a name and, consequently, what items they ordered. To me they are both top level things, with some overlap of fields. If I'm searching for people, I'm likely not going to be interested in the parts of the item, while if I'm searching for items the likelihood is that I may want to search for 42532 which is, in this instance, a SKU, and not get hits on the zip code section of the people. Does it make sense, then, to separate these two out as separate documents? I believe so because the documentation I've read suggests that a document should be analogous to a row in a table (in this case, very de-normalized). What is tripping me up is, as far as I can tell, you can have only one document type per index, and thus one document per core. So in this example, I have two cores, items and people. Is this correct? Should I embrace the idea of having many cores or am I supposed to have a single, unified index with all documents (which doesn't seem like Solr supports). The ultimate question comes down to the search interface. I don't necessarily want to have the user explicitly state which document they want to search; I'd like them to simply type 42532 and get documents from both cores, and then possibly allow for filtering results after the fact, not before. As I've only used the admin site so far (which is core-specific), does the client API allow for unified searching across all cores? Assuming it does, I'd think my idea of multiple-documents is okay, but I'd love to hear from people who actually know what they're doing. :) Thanks, Ron BTW: Sorry about the problem with the previous message; I didn't know about thread hijacking. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Negative filter using the appends element
Does not work: lst name=appends str name=fq-tag:test/str /lst Can you append echoParams=all to your search url and verify that that fq=-tag:test included in response?
Spatial
https://issues.apache.org/jira/browse/LUCENE-2519 If I change my code as per 2519 to have this - public double[] coords(double latitude, double longitude) { double rlat = Math.toRadians(latitude); double rlong = Math.toRadians(longitude); double nlat = rlong * Math.cos(rlat); return new double[]{nlat, rlong}; } return this - x = (gamma - gamma[0]) cos(phi) y = phi would it make it give correct results? Correct projections, tier ids? I am not talking about changing Lucene/Solr code, I can duplicate the classes to create my own version. Just wanted to be sure about the results. Pradeep
xi:include
Hi I am trying to use xi:include in my solrconfig.xml. For example: xi:include href=http://localhost/config/config.aspx; / This works fine, as long as config.aspx exists, and as long as it returns valid xml. Sometimes though, the config.aspx can fail, and return invalid xml. Then I get a problem, as Solr's parsing of the solrconfig.xml fails. If I use xi:fallback, eg: xi:include href=http://localhost/config/config.aspx; xi:fallback str name=qf text^0.4 n^1.2 c^1.5 d^0.4 b^3 /str /xi:fallback /xi:include This helps if config.xml does not exist - then the fallback is used. But if config.aspx returns invalid xml, then the fallback does not appear to be used, and I get exceptions when I start Solr up. How can I get Solr to fallback if the included xml fails? Thanks, Peter
Re: query results file for trec_eval
I am a student and I am trying to run evaluation for TREC format document. I have the judgments. I would like to have the output of my queries for use with trec_eval software. Can someone please point me how to make Solr spit out output in this format? Or at least point me to some material that guides me through this. Lucene has a package (org.apache.lucene.benchmark.quality.trec) for this. http://search-lucene.com/jd/lucene/org/apache/lucene/benchmark/quality/package-summary.html
Re: Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?
Hi All, Just wanted to post an update on where we stand with all the requests for new features List of Features Requested In SOLR PECL Extension 1. Ability to Send Custom Requests to Custom URLS other than select, update, terms etc. 2. Ability to add files (pdf, office documents etc) 3. Windows version of latest releases. 4. Ensuring that SolrQuery::getFields(), SolrQuery::getFacets() et al returns an array consistently. 5. Lowering Libxml version to 2.6.16 If there is anything that you think I left out please let me know. This is a summary. On Wed, Oct 13, 2010 at 3:48 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: On Tue, Oct 12, 2010 at 6:29 PM, Israel Ekpo israele...@gmail.com wrote: I think this feature will take care of this. What do you think? sounds good! -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Spatial
On Oct 19, 2010, at 6:23 PM, Pradeep Singh wrote: https://issues.apache.org/jira/browse/LUCENE-2519 If I change my code as per 2519 to have this - public double[] coords(double latitude, double longitude) { double rlat = Math.toRadians(latitude); double rlong = Math.toRadians(longitude); double nlat = rlong * Math.cos(rlat); return new double[]{nlat, rlong}; } return this - x = (gamma - gamma[0]) cos(phi) y = phi would it make it give correct results? Correct projections, tier ids? I'm not sure. I have a lot of doubt around that code. After making that correction, I spent several days trying to get the tests to pass and ultimately gave up. Does that mean it is wrong? I don't know. I just don't have enough confidence to recommend it given that the tests I were asking it to do I could verify through other tools. Personally, I would recommend seeing if one of the non-tier based approaches suffices for your situation and use that. -Grant
Re: boosting injection
The main disadvantage of index-time boosting is that you must reindex your corpus entirely if you want to alter the boost factors. And there's no very good way to anticipate what boost factors will give you the results you want I wonder if you could cheat and do some basic string processing on the query and add your boosts? That's be tricky unless you have very predictable strings Best Erick On Tue, Oct 19, 2010 at 10:33 AM, Andrea Gazzarini andrea.gazzar...@atcult.it wrote: Y-E-A-H! I think it's so! Markus, what are disadvantages of this boosting strategy? Thanks a lot Andrea Il 19/10/2010 16:25, Markus Jelsma ha scritto: Index-time boosting maybe? http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22 On Tuesday, October 19, 2010 04:23:46 pm Andrea Gazzarini wrote: Hi Ken, thanks for your response...unfortunately it doesn't solve my problem. I cannot chnage the client behaviour so the query must be a query and not only the query terms. In this scenario, It would be great, for example, if I could declare the boost in the schema field definitionbut I think it's not possible isn't it? Regards Andrea _ From: Ken Stanley [mailto:doh...@gmail.com] To: solr-user@lucene.apache.org Sent: Tue, 19 Oct 2010 15:05:31 +0200 Subject: **SPAM** Re: boosting injection Andrea, Using the SOLR dismax query handler, you could set up queries like this to boost on fields of your choice. Basically, the q parameter would be the query terms (without the field definitions, and a qf (Query Fields) parameter that you use to define your boost(s): http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative would be to parse the query in whatever application is sending the queries to the SOLR instance to make the necessary transformations. Regards, Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini andrea.gazzar...@atcult.it wrote: Hi all, I have a client that is sending this query q=title:history AND author:joyce is it possible to transform at runtime this query in this way: q=title:history^10 AND author:joyce^5 ? Best regards, Andrea
Re: Documents and cores
This is something most everybody has to get over when transitioning from the DB world to Solr/Lucene. The schema describes the #possible# fields in the document. There is absolutely no requirement that #every# document in the index have all these fields in them (unless #you# define it so with field . required=true. Solr will happily index documents that have fields missing, so feel free... You should be able to define your people and parts documents as you choose, with perhaps some common fields. You'll have to take some care not to form queries like name:ralph AND sku:12345 assuming that the name field is only in people and sku only in parts Do continue down the path of de-normalization. That's another thing most DB folks don't want to do. Each document you index should contain all the data you need. The moment you find yourself asking how to I do a join you should stop and consider further de-normalization. HTH Erick On Tue, Oct 19, 2010 at 10:39 AM, Olson, Ron rol...@lbpc.com wrote: Hi all- I have a newbie design question about documents, especially with SQL databases. I am trying to set up Solr to go against a database that, for example, has items and people. The way I see it, and I don't know if this is right or not (thus the question), is that I see both as separate documents as an item may contain a list of parts, which the user may want to search, and, as part of the item, view the list of people who have ordered the item. Then there's the actual people, who the user might want to search to find a name and, consequently, what items they ordered. To me they are both top level things, with some overlap of fields. If I'm searching for people, I'm likely not going to be interested in the parts of the item, while if I'm searching for items the likelihood is that I may want to search for 42532 which is, in this instance, a SKU, and not get hits on the zip code section of the people. Does it make sense, then, to separate these two out as separate documents? I believe so because the documentation I've read suggests that a document should be analogous to a row in a table (in this case, very de-normalized). What is tripping me up is, as far as I can tell, you can have only one document type per index, and thus one document per core. So in this example, I have two cores, items and people. Is this correct? Should I embrace the idea of having many cores or am I supposed to have a single, unified index with all documents (which doesn't seem like Solr supports). The ultimate question comes down to the search interface. I don't necessarily want to have the user explicitly state which document they want to search; I'd like them to simply type 42532 and get documents from both cores, and then possibly allow for filtering results after the fact, not before. As I've only used the admin site so far (which is core-specific), does the client API allow for unified searching across all cores? Assuming it does, I'd think my idea of multiple-documents is okay, but I'd love to hear from people who actually know what they're doing. :) Thanks, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Negative filter using the appends element
I suspect, but don't know for sure, that you need to modify it to *:* - tag:test but I confess I'm not at all sure that it'll work in this context.. Best Erick On Tue, Oct 19, 2010 at 11:10 AM, Kevin Cunningham kcunning...@telligent.com wrote: I'm using Solr 1.4 with the standard request handler and attempting to apply a negative fq for all requests via the appends elements but its not being applied. Is this an intended limitation? I looked in JIRA for an existing issue but nothing jumped out. Works fine: lst name=appends str name=fqtag:test/str /lst Does not work: lst name=appends str name=fq-tag:test/str /lst
Multiple partial word searching with dismax handler
Hi, I have some problem with combining the query with multiple parital-word searching in dismax handler. In order to make multiple partial word searching, I use EdgeNGramFilterFactory, and my query must be something like this: name_ngram:sun name_ngram:hot in q.alt combined with my search handler ( http://localhost:8081/solr/select/?q.alt=name_ngram:sun%20name_ngram:hotqt=products). I wonder how I combine this with my search handler. Here is my search handler config: requestHandler name=products class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=defTypedismax/str str name=qfname^200 full_text/str str name=bffap^15/str str name=fluuid/str str name=version2.2/str str name=indenton/str str name=tie0.1/str /lst lst name=appends str name=fqtype:Product/str /lst lst name=invariants str name=facetfalse/str /lst arr name=last-components strspellcheck/str strelevateProducts/str /arr /requestHandler If I query with this url http://localhost:8081/solr/select/?q.alt=name_ngram:sun%20name_ngram:hotq=sun hotqt=products, it doesn't show the correct answer like the previous query. How could configure this in my search handler with boost score? -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/
Not able to subscribe to ML
Just a test mail to check if my mails are reaching the ML. I dont know, but my mails are failing to reach the ML with the following error : Delivery to the following recipient failed permanently: solr-user@lucene.apache.org Technical details of permanent failure: Google tried to deliver your message, but it was rejected by the recipient domain. We recommend contacting the other email provider for further information about the cause of this error. The error that the other server returned was: 552 552 spam score (5.7) exceeded threshold (state 18). - Abdullah