Restricting results based on user authentication
Hi, I am using DIH feature of Solr for indexing a database. I am using Solr server and it is independent of my web application. I send a http request for searching and then process the returned result. Now we have a requirement that we have to filter the results further based on security level restrictions? For example, user id abc should not be allowed to see a particular result. How could we achieve that? I followed,http://www.nabble.com/Restricted-views-of-an-index-td15088750.html#a15090791 It suggests something like - Add a role or access class to each indexed item, then use that in the queries, probably in a filter specified in a request handler. That keeps the definition of the filter within Solr. For example, you can create a request handler named admin, a field named role, and add a filter of role:admin. I could not follow this solution. Is there any example or resource that explains how to use custom request handler with filtering? Thanks, Manu -- View this message in context: http://www.nabble.com/Restricting-results-based-on-user-authentication-tp21411449p21411449.html Sent from the Solr - User mailing list archive at Nabble.com.
DataImportHandler: UTF-8 and Mysql
Hello, First of all thanks to Jacob Singh for his reply on my mail last week, I completely forgot to reply. Multicore is perfect for my needs. I've got Solr running now with my new schema partially implemented and I've started to test importing data with DIH. I've run in to a number of issues though and I hope someone here can help: 1. Posting UTF-8 data through the example post-script works and I get the proper results back when I query using the admin page. However, data imported through the DataImportHandler from a MySQL database (the database contains correct data, it's a copy of a production db and selecting through the client gives the correct characters) I get ó instead of ó. I've tried several combinations of arguments to my datasource url (useUnicode=truecharacterEncoding=UTF-8) but it does not seem to help. How do I get this to work correctly? 2. On the wikipage for DataImportHandler, the deletedPkQuery has no real description, am I correct in assuming it should contain a query which returns the ids of items which should be removed from the index? 3. Another question concerning the DataImportHandler wikipage, I'm not sure about the exact way the field-tag works. From the first data-config.xml example for the full-import I can infer that the column-attribute represents the column from the sql-query and the name-attribute represents the name of the field in the schema the column should map to. However further on in the RegexTransformer section there are column-attributes which do not correspond to the sql-query result set and its the sourceColName attribute which acually represents that data, which comes from the RegexTransformer I understand but why then is the column attribute used instead of the name-attribute. This has confused me somewhat, any clarification would be greatly appreciated. Regards, gwk
Deletion of indexes.
Hi, I am using solr 1.3. I am facing a problem to delete the index. I have mysql database. Some of the data from database is deleted, but the indexing for those records is still present. Due to that I am getting those records in search result. I don't want this type of behavior. I want to delete those indexes which are not present in database. Also, I don't know which records are deleted from database and present in index. Is there any way to solve this problem? Also I think that re indexing will not solve my problem, because it will re index only the records which are present in database and don't bother about the indexes which don't have reference in database. Can anyone have solution for this? Thanks, Tushar -- View this message in context: http://www.nabble.com/Deletion-of-indexes.-tp21412630p21412630.html Sent from the Solr - User mailing list archive at Nabble.com.
To get all indexed records.
Hi, I am using solr 1.3. I want to retrieve all records from index file. How should I write solr query so that I will get all records? Thanks, Tushar. -- View this message in context: http://www.nabble.com/To-get-all-indexed-records.-tp21413170p21413170.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: To get all indexed records.
Use *:* as a query to get all records. Refer to http://wiki.apache.org/solr/SolrQuerySyntax for more info. On Mon, Jan 12, 2009 at 5:30 PM, Tushar_Gandhi tushar_gan...@neovasolutions.com wrote: Hi, I am using solr 1.3. I want to retrieve all records from index file. How should I write solr query so that I will get all records? Thanks, Tushar. -- View this message in context: http://www.nabble.com/To-get-all-indexed-records.-tp21413170p21413170.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Akshay Ukey.
Re: To get all indexed records.
Hi Tushar, 1. If you are using SOLR admin console to search record, then default query '*:*' in the Query String search box will serve the purpose. 2. If you directly want to send an HTTP request for retrieving records then you can hit a URL similar to following - http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on Note - Here, 'start' and 'rows' in the URL specify the first record returned and total number of records returned respectively. 3. If you are using Solrj for querying from Java, following code snippet would be helpful - CommonsHttpSolrServer server = new CommonsHttpSolrServer(http://localhost:8983/solr;); SolrQuery query = new SolrQuery(*:*); QueryResponse results = server.query(query); SolrDocumentList list = results.getResults(); Thanks, Manu Tushar_Gandhi wrote: Hi, I am using solr 1.3. I want to retrieve all records from index file. How should I write solr query so that I will get all records? Thanks, Tushar. -- View this message in context: http://www.nabble.com/To-get-all-indexed-records.-tp21413170p21414148.html Sent from the Solr - User mailing list archive at Nabble.com.
Index is not created if my database table is large
Hii, I new to SOLR world... i am using solr multicore config in my webapp i am able to configure solr properly... but problem is when i am building using full data-import... if my databse table has few number of rows say 10 to 25 the index is created properly... and search query returns proper result but when i create index table for large table...index is not propery created.. and it does not return any result for search ... what's the problem... can any body help me out . my data-config file looks like this.. dataSource type=JdbcDataSource name=ds-1 driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/retaildb?characterEncoding=UTF-8 user=kickuser password=kickapps / document name=countries entity dataSource=ds-1 name=zonesToCountry pk=countryId query=select * from countries limit 10 deltaQuery=select * from countries limit 10 docRoot=true field column=countries_id name=id / field column=countries_name name=countries_name / field column=countries_iso_code_2 name=countries_iso_code_2 / field column=countries_iso_code_3 name=countries_iso_code_3 / entity dataSource=ds-1 name=zones pk=zone_id query=select * from zones z where z.zone_country_id='${zonesToCountry.countries_id}' field column=zone_code name=zone_code / field column=zone_name name=zone_name / /entity /entity /document -- Thanks and Regards Rahul G.Brid
Re: Greter than conditions in Solr.
On Jan 12, 2009, at 7:13 AM, Tushar_Gandhi wrote: Is it possible to write a query like id 0? Sure... id:[1 TO *] See here for lots more details: http://wiki.apache.org/solr/SolrQuerySyntax . Be sure to follow the link to the Lucene query syntax for fuller details. Erik
Re: Index is not created if my database table is large
Hi, I'm not sure that this is the same issue but I had a similar problem with importing a large table from Mysql, on the DataImportHandler FAQ (http://wiki.apache.org/solr/DataImportHandlerFaq) the first issue mentions memory problems. Try adding the batchSize=-1 attribute to your datasource, it fixed the problem for me. Regards, gwk
Re: Index is not created if my database table is large
Hi,thnx for the reply ...but an you tell me where to set this batchSize??? in dataconfig.xml On Mon, Jan 12, 2009 at 8:48 AM, gwk g...@eyefi.nl wrote: Hi, I'm not sure that this is the same issue but I had a similar problem with importing a large table from Mysql, on the DataImportHandler FAQ ( http://wiki.apache.org/solr/DataImportHandlerFaq) the first issue mentions memory problems. Try adding the batchSize=-1 attribute to your datasource, it fixed the problem for me. Regards, gwk -- Thanks and Regards Rahul G.Brid
Re: Index is not created if my database table is large
Hey...i tried using batchsize=-1 it doesnt work,..I am not getting any memory problem as such... http://127.0.0.1/search/products/dataimport?command=full-importdebug=onverbose=true which runs without error gives me the response also but when i query using admin it does not returns any result set..this happens when database table has large number of rows On Mon, Jan 12, 2009 at 9:17 AM, Rahul Brid rahul.b...@balajisoftware.inwrote: Hi,thnx for the reply ...but an you tell me where to set this batchSize??? in dataconfig.xml On Mon, Jan 12, 2009 at 8:48 AM, gwk g...@eyefi.nl wrote: Hi, I'm not sure that this is the same issue but I had a similar problem with importing a large table from Mysql, on the DataImportHandler FAQ ( http://wiki.apache.org/solr/DataImportHandlerFaq) the first issue mentions memory problems. Try adding the batchSize=-1 attribute to your datasource, it fixed the problem for me. Regards, gwk -- Thanks and Regards Rahul G.Brid
Re: Database permissions integration and Sub documents
On Jan 11, 2009, at 10:08 PM, Mike Shredder wrote: Hi , I'm new to Solr .. I've been able to get Solr up running. But got some quick questions. 1) How do I filter results based on permissions from an external database system ? -- Should I implement a queryfilter which will look up permissions in the DB for permissions on each doc returned . -- Or should I handle this in a request handler ? I have one project that has permissions in a db. What I do is index the permissions group ids along with the documents, so that I can use a simple query parameter appended to the users' search strings. The only drawback is that when the permissions change, the documents must be (entirely) reindexed, which can be a pain (like when one change effects half your index), but it's a small price to pay for the speed improvements vs. constantly querying the database. 2) I need to support sub-documents documents. So I was planning to make my sub-documents as Solr docs. But depending on query types I need to dup out sub-documents and return only one document for all sub-docs in a result set. Which interface to I needs to implement to achieve this ? Check out SOLR-236. I'm using it for this purpose (using the ivan-3 patch). Works well for me although faceting can be a bit strange. https://issues.apache.org/jira/browse/SOLR-236 3) if I do duping , my total result count will be off , what is the right way to return an estimated total doc count ... The doc count returned from solr-236 would be accurate, just the facet counts are off. -- Steve
Improving Readability of Hit Highlighting
I'm indexing text from an OCR of an old document. Many words get read perfectly, but they're typically embedded in a lot of junk. I would like the hit highlighting to show only the 'good' words, in the order in which they appeared in the original document. Is it possible to use output of the filter classes as the text used in hit highlighting? Or do you have to all the text cleanup outside of Solr and present it with two fields to index, one with the original text, and one with the cleaned up text. The objective of the hit highlighting is to give the user a *sense* of the original context, even if it's not provided verbatim from the original document. Thanks in advance. TerryG
Re: Query regarding Spelling Suggestions
Solr 1.3 doesn't use Log4J, it uses Java Utility Logging (JUL). I believe the info level in the logs is sufficient. Let's start by posting what you have? Also, are you able to get the sample spellchecking to work? On Jan 12, 2009, at 2:16 AM, Deshpande, Mukta wrote: Hi, Could you please send me the needful entries in log4j.properties to enable logging, explicitly for SpellCheckComponent. My current log4j.properties looks like: log4j.rootLogger=INFO,console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n log4j.logger.org.apache.solr=DEBUG With these settings I can only see the INFO level logs. I tried to change the log level for SpellCheckComponent to FINE using the admin logging page http://localhost:8080/solr/admin/logging but did not see any difference in logging. Thanks, ~Mukta -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Monday, January 12, 2009 3:22 AM To: solr-user@lucene.apache.org Subject: Re: Query regarding Spelling Suggestions Can you send the full log? On Jan 11, 2009, at 1:51 PM, Deshpande, Mukta wrote: I am using the example schema that comes with the Solr installation downloaded from http://www.mirrorgeek.com/apache.org/lucene/solr/. I have added the word field with textSpell fieldtype in the schema.xml file, as specified in the below mail. My spelling index exist under SOLR HOME/data/ If I open my index in Luke I can see the entries against word field. Thanks, ~Mukta From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Fri 1/9/2009 8:29 AM To: solr-user@lucene.apache.org Subject: Re: Query regarding Spelling Suggestions Can you put the full log (as short as possibly demonstrates the problem) somewhere where I can take a look? Likewise, can you share your schema? Also, does the spelling index exist under SOLR HOME/data/index? If you open it w/ Luke, does it have entries? Thanks, Grant On Jan 8, 2009, at 11:30 PM, Deshpande, Mukta wrote: Yes. I send the build command as: http://localhost:8080/solr/select/? q=documnetspellcheck=truespellch eck .build =truespellcheck.count=2spellcheck.q=parfectspellcheck.dictionar y=dict The Tomcat log shows: Jan 9, 2009 9:55:19 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select/ params ={spellcheck=trueq=documnetspellcheck.q=parfectspellcheck.dicti onary=dictspellcheck.count=2spellcheck.build=true} hits=0 status=0 QTime=141 Even after sending the build command I do not get any suggestions. Can you please check. Thanks, ~Mukta -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Thursday, January 08, 2009 7:42 PM To: solr-user@lucene.apache.org Subject: Re: Query regarding Spelling Suggestions Did you send in the build command? See http://wiki.apache.org/solr/SpellCheckComponent On Jan 8, 2009, at 5:14 AM, Deshpande, Mukta wrote: Hi, I am using Wordnet dictionary for spelling suggestions. The dictionary is converted to Solr index with only one field word and stored in location solr-home/data/syn_index, using syns2Index.java program available at http://www.tropo.com/techno/java/lucene/wordnet.html I have added the word field in my schema.xml as field name=word type=textSpell indexed=true stored=true/ My application data indexes are in solr-home/data I am trying to use solr.IndexBasedSpellChecker to get spelling suggestions. My spell check component is configured as: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedict/str str name=classnamesolr.IndexBasedSpellChecker/str str name=fieldword/str str name=characterEncodingUTF-8/str str name=spellcheckIndexDir./syn_index/str /lst /searchComponent I have added this component to my standard request handler as: requestHandler name=standard class=solr.StandardRequestHandler default=true lst name=defaults str name=echoParamsexplicit/str /lst arr name=last-components strspellcheck/str /arr /requestHandler With the above configuration, I do not get any spelling suggestions. Can somebody help ASAP. Thanks, ~Mukta -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Deletion of indexes.
I got around this problem by using a trigger on the table I index that records the values of deleted items in a queue table so when my next Solr update rolls around it sends a remove request for that record's ID. Once the Solr deletion is done, I remove that ID from the queue table. Of course, you have to be on MySQL 5.0 or above to have that available to you. Otherwise, you'll have to manually add something to your deletion queries to record all the IDs you're about to delete to a queue table. Ryan T. Grange, IT Manager DollarDays International, Inc. Tushar_Gandhi wrote: Hi, I am using solr 1.3. I am facing a problem to delete the index. I have mysql database. Some of the data from database is deleted, but the indexing for those records is still present. Due to that I am getting those records in search result. I don't want this type of behavior. I want to delete those indexes which are not present in database. Also, I don't know which records are deleted from database and present in index. Is there any way to solve this problem? Also I think that re indexing will not solve my problem, because it will re index only the records which are present in database and don't bother about the indexes which don't have reference in database. Can anyone have solution for this? Thanks, Tushar
Re: Restricting results based on user authentication
Hi Manu, I haven't made a custom request handler in a while, but I want to clarify that, if you trust your application code, you don't actually need a custom request handler to do this sort of authentication filtering. At indexing time, you can add a role field to each object that you index, as described in the thread. At query time, you could simply have your application code add an appropriate filter query to each Solr request. So, if you're using the standard XML query interface, instead of sending URLs like http://.../solr/select?q=foo... you can have your application code send URLs like http://.../solr/select?q=foofq=role:admin... If I understand the custom request handler approach, then it basically amounts to the same thing as the above; the only difference is that the filter query gets added internally by Solr, rather than at the application level. Sorry if you already understand all this; I'm throwing these comments out just in case. Cheers, Chris On Mon, Jan 12, 2009 at 1:54 AM, Manupriya manupriya.si...@gmail.com wrote: Hi, I am using DIH feature of Solr for indexing a database. I am using Solr server and it is independent of my web application. I send a http request for searching and then process the returned result. Now we have a requirement that we have to filter the results further based on security level restrictions? For example, user id abc should not be allowed to see a particular result. How could we achieve that? I followed,http://www.nabble.com/Restricted-views-of-an-index-td15088750.html#a15090791 It suggests something like - Add a role or access class to each indexed item, then use that in the queries, probably in a filter specified in a request handler. That keeps the definition of the filter within Solr. For example, you can create a request handler named admin, a field named role, and add a filter of role:admin. I could not follow this solution. Is there any example or resource that explains how to use custom request handler with filtering? Thanks, Manu -- View this message in context: http://www.nabble.com/Restricting-results-based-on-user-authentication-tp21411449p21411449.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom Transformer to handle Timestamp
Hi all I am using solr to index data from my database. In my database there is a timestamp field of which data will be in the form of, 15-09-08 06:28:38.44200 AM. The column is of type TIMESTAMP in the oracle db. So in the schema.xml i have mentioned as: field name=LOGIN_TIMESTAMP type=date indexed=true stored=true / While indexing data in the debug mode i get this timestamp value as arr stroracle.sql.TIMESTAMP:oracle.sql.timest...@f536e8/str /arr And when i do a searching this value is not displaying while all other fields indexed along with it are getting displayed. 1) So do i need to write a custom transformer to add these values to the index. 2)And if yes I am confused how it is? Is there a sample code somewhere? I have tried the sample TrimTransformer and it is working. But can i convert this string to a valid date format.(I am not a java expert..:-( )? Expecting your reply Thanks in advance Con -- View this message in context: http://www.nabble.com/Custom-Transformer-to-handle-Timestamp-tp21421742p21421742.html Sent from the Solr - User mailing list archive at Nabble.com.
Single index - multiple SOLR instances
Hello, Is it possible to have the index created by a single SOLR instance, but have several SOLR instances field the search queries. Or do I HAVE to replicate the index for each SOLR instance that I want to answer queries? I need to set up a fail-over instance. Thanks - ashok -- View this message in context: http://www.nabble.com/Single-index---multiple-SOLR-instances-tp21422543p21422543.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Single index - multiple SOLR instances
Ashok, You can put your index on any kind of shared storage - SAN, NAS, NFS (this one is not recommended). That will let you point all your Solr instances to a single copy of your index. Of course, you will want to test performance to ensure the network is not slowing things down too much, if there is network in the picture. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: ashokc ash...@qualcomm.com To: solr-user@lucene.apache.org Sent: Monday, January 12, 2009 3:05:40 PM Subject: Single index - multiple SOLR instances Hello, Is it possible to have the index created by a single SOLR instance, but have several SOLR instances field the search queries. Or do I HAVE to replicate the index for each SOLR instance that I want to answer queries? I need to set up a fail-over instance. Thanks - ashok -- View this message in context: http://www.nabble.com/Single-index---multiple-SOLR-instances-tp21422543p21422543.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting only fields that match
Norbert, Other than though explain query method I don't think we have any mechanism to figure out which field(s) exactly a query matched. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Norbert Hartl norb...@hartl.name To: solr-user@lucene.apache.org Sent: Sunday, January 11, 2009 6:41:12 PM Subject: Re: Getting only fields that match Hi, On Sun, 2009-01-11 at 17:07 +0530, Shalin Shekhar Mangar wrote: On Sun, Jan 11, 2009 at 4:02 PM, Norbert Hartl wrote: I like the search result to include only the fields that matched the search. Is this possible? I only saw the field spec where you can have a certain set of fields or all. Are you looking for highlighting (snippets)? http://wiki.apache.org/solr/HighlightingParameters A Field can be indexed (searchable) or stored (retrievable) or both. When you make a query to Solr, you yourself specify which fields it needs to search on. If they are stored, you can ask to retrieve those fields only. Not sure if that answers your question. no, it doesn't. I want to have the following: Doc1 field one = super test text field two = something field three = another thing Doc2 field one = even other stuff field zzz = this is a test Searching for test I want to retrieve Doc1 field one Doc2 field zzz So I want only retrieve the fields that match the search (test in this case) I hope this makes it clear. Norbert
Re: Single index - multiple SOLR instances
Thanks, Otis. That is great, as I plan to place the index on NAS and make it writable to a single solr instance (write load is not heavy) and readable by many solr instances to handle fail-over and also share the query load (query load can be high) - ashok Otis Gospodnetic wrote: Ashok, You can put your index on any kind of shared storage - SAN, NAS, NFS (this one is not recommended). That will let you point all your Solr instances to a single copy of your index. Of course, you will want to test performance to ensure the network is not slowing things down too much, if there is network in the picture. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: ashokc ash...@qualcomm.com To: solr-user@lucene.apache.org Sent: Monday, January 12, 2009 3:05:40 PM Subject: Single index - multiple SOLR instances Hello, Is it possible to have the index created by a single SOLR instance, but have several SOLR instances field the search queries. Or do I HAVE to replicate the index for each SOLR instance that I want to answer queries? I need to set up a fail-over instance. Thanks - ashok -- View this message in context: http://www.nabble.com/Single-index---multiple-SOLR-instances-tp21422543p21422543.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Single-index---multiple-SOLR-instances-tp21422543p21423138.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Improving Readability of Hit Highlighting
I'm not sure if I have a good suggestion, but I have a question. :) What is considered junk? Would it be possible to eliminate the junk before it even goes into the index in order to avoid GIGO (Garbage In Garbage Out)? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Terence Gannon butzi0...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, January 12, 2009 11:00:31 AM Subject: Improving Readability of Hit Highlighting I'm indexing text from an OCR of an old document. Many words get read perfectly, but they're typically embedded in a lot of junk. I would like the hit highlighting to show only the 'good' words, in the order in which they appeared in the original document. Is it possible to use output of the filter classes as the text used in hit highlighting? Or do you have to all the text cleanup outside of Solr and present it with two fields to index, one with the original text, and one with the cleaned up text. The objective of the hit highlighting is to give the user a *sense* of the original context, even if it's not provided verbatim from the original document. Thanks in advance. TerryG
Re: Improving Readability of Hit Highlighting
To answer your questions specifically, here is an example of the raw OCR output; CONTRACTORINMPRIMENTAYIVE : mom Ale ACCEPT INFORMATIONON TOUR SHEET TO ea to which I would like to see; mom ale access tour sheet to in the hit highlight. My schema for this field is pretty much standard, as follows; tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ... filter class=solr.WordDelimiterFilterFactory ... filter class=solr.LowerCaseFilterFactory ... filter class=solr.EnglishPorterFilterFactory ... filter class=solr.RemoveDuplicatesTokenFilterFactory ... When I examine the effect of each of these with the Analyzer, it seems like if I could use the output after LowerCaseFilterFactory in the hit highlight, I would come close to achieving what I want. I'm not averse to doing the text cleanup external to Solr before the indexing, but only if it's *not* redundant to what the filter factories are going to do anyway. Thanks for your help! TerryG
Re: Single index - multiple SOLR instances
OK. Of course, you'll have to make sure everything on the SAN is redundant (down to controllers, power supplies, etc.) and that the disks can handle that high query load/IO. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: ashokc ash...@qualcomm.com To: solr-user@lucene.apache.org Sent: Monday, January 12, 2009 3:37:41 PM Subject: Re: Single index - multiple SOLR instances Thanks, Otis. That is great, as I plan to place the index on NAS and make it writable to a single solr instance (write load is not heavy) and readable by many solr instances to handle fail-over and also share the query load (query load can be high) - ashok Otis Gospodnetic wrote: Ashok, You can put your index on any kind of shared storage - SAN, NAS, NFS (this one is not recommended). That will let you point all your Solr instances to a single copy of your index. Of course, you will want to test performance to ensure the network is not slowing things down too much, if there is network in the picture. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: ashokc To: solr-user@lucene.apache.org Sent: Monday, January 12, 2009 3:05:40 PM Subject: Single index - multiple SOLR instances Hello, Is it possible to have the index created by a single SOLR instance, but have several SOLR instances field the search queries. Or do I HAVE to replicate the index for each SOLR instance that I want to answer queries? I need to set up a fail-over instance. Thanks - ashok -- View this message in context: http://www.nabble.com/Single-index---multiple-SOLR-instances-tp21422543p21422543.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Single-index---multiple-SOLR-instances-tp21422543p21423138.html Sent from the Solr - User mailing list archive at Nabble.com.
Highlighting Trouble With Bigram Shingle Index
I'm running into some highlighting issues that appear to arise only when I'm using a bigram shingle (ShingleFilterFactory) analyzer. I started with a bigram-free situation along these lines: field name=body type=noshingleText indexed=false stored=false / !-- Stored text for use with highlighting: -- field name=kwic type=noshingleText indexed=false stored=true compressed=true multiValued=false / copyField source=body dest=kwic maxLength=10 / fieldType name=noshingleText class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType For performance reasons, though, I wanted to turn on bigram shingle indexing on the body field. (For more information see http://www.nabble.com/Using-Shingles-to-Increase-Phrase-Search-Performance-td19015758.html#a19015758) In particular, I wanted to use this field type: fieldType name=shingleText class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory outputUnigrams=true / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory outputUnigrams=false outputUnigramIfNoNgram=true / /analyzer /fieldType (Regarding outputUnigramsIfNoNgram parameter, see http://issues.apache.org/jira/browse/SOLR-744.) I wasn't sure if I should want to define my kwic field (the one I use for highlighting) as type shingleText, to match the body field, or type noshingleText. So I tried both. Neither work quite as desired. [kwic as type shingleText] If I have both body and kwic as type shingleText, then highlighting more or less works, but there are some anomolies. The main thing is that it really likes to pick fragments where the highlighted term (e.g. car) is the last term in the fragment: ... la la la la la emcar/em ... ... foo foo foo foo foo emcar/em ... This should obviously happen some of the time, but this is happening with like 95% of my fragments, which is statistically unexpected. And unfortunate. And it doesn't happen if I turn of shingling. Another issue is that, if there are two instances of a highlighted term within a given fragment, it will often highlight not just those instances, but all the terms in between, like this: ... boo boo bar emcar la la la car/em bar bar bar ... This too doesn't seem to happen if I disable bigram indexing. I haven't figured out why this is the case. One potential issue is that the TokenGroup abstraction doesn't necessarily make sense if you have a token stream of alternating unigrams and bigrams like this: the, the cat, cat, cat went, went, went for, for, ... Even if you could have a TokenGroup abstraction that makes sense, the current implementation of TokenGroup.isDistinct looks like this: return token.startOffset()=endOffset and it turns false most of the time in this case. (I can give some explanation of why, but maybe I'll save that for later.) I'm not sure if the highlighter can easily be made to accomodate sequences of alternating unigrams and bigrams, or if highlighting should really only be attempted on bigram-free token streams. [kwic with type noshingleText] If I set kwic to be of type noshingleText, then the above symptoms go away. Some things are not quite right, though. The particular symptom now is that if I do a quoted query like big dog then the correct results get returned, but no preview fragments are returned. The underlying reason this happens is that an inappropriate Query object is being passed to the constructor for QueryScorer. The query that gets passed is TermQuery:big dog That is the Query that should be used for *searching* on my bigram body field, but it's *not* the Query that should be used for *highlighting*; the Query that should be used for highlighting is something like PhraseQuery:big dog~0 What apparently is going on is that the highlighter is using the Query object generated by the the *search* component to do highlighting. One possibility is that the highlighter should instead create a separate Query object for each hl.fl parameter; each one would use the analyzer particular to the given *highlighting* field, rather than the one for the default search field. There might be reasons why that would be crazy, though. Sorry this post is a little half-baked, but I'd really
Summing the results in a collapse
I have been using the Collapse extension, and have it working pretty good. However I would like to find out if there is a way to show the collapsed results, and then sum up a field of one of the remaining results. For example I display Result 1, (There 20 results, totalling $50.00). Where the 20 would be the number of items returned from the collapse, and the $50.00 would be the sum fee field in the 20 collapsed results. Any help would be greatly appreciated. Thank you, -John
Multiple result fields in a collapse or subquery
Is there anyway to have multiple collapse.field directives in the search string? What I am trying to accomplish is the following Result 1 (20 results) EU (5 results) USD (15 results) Result 2 (10 results) EU (5 results) USD (5 results) I thought that this could be done with faceting but with faceting you get the sum total for each keyword. So for the above I get: EU (10 results) USD (20 results) Which works well guiding a search, in to deeper more meaningful results. However I would like have additional data that is tailored to each result row. Any help would be greatly appreciated. Thank you, -John
Re: Improving Readability of Hit Highlighting
Hi, Quick note: please include copy of previous email when replying, so people can be reminded of the context. You mentioned junk getting highlighted. In your case is CONTRACTORINMPRIMENTAYIVE getting highlighted? And that is junk?If so, why not augment your indexing to throw out junk tokens if you have some rules for what constitutes junk tokens? (e.g. token not in dictionary) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Terence Gannon butzi0...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, January 12, 2009 4:07:57 PM Subject: Re: Improving Readability of Hit Highlighting To answer your questions specifically, here is an example of the raw OCR output; CONTRACTORINMPRIMENTAYIVE : mom Ale ACCEPT INFORMATIONON TOUR SHEET TO ea to which I would like to see; mom ale access tour sheet to in the hit highlight. My schema for this field is pretty much standard, as follows; When I examine the effect of each of these with the Analyzer, it seems like if I could use the output after LowerCaseFilterFactory in the hit highlight, I would come close to achieving what I want. I'm not averse to doing the text cleanup external to Solr before the indexing, but only if it's *not* redundant to what the filter factories are going to do anyway. Thanks for your help! TerryG
Re: Restricting results based on user authentication
Thanks Chris, I agree with your approach. I also dont want to add anything at the application level. I want authentication to be handled internally at the Solr level itself. Can you please explain me little more about how to add a role field to each object at indexing time? Is there any resource/example available explaining this? Thank, Manu ryguasu wrote: Hi Manu, I haven't made a custom request handler in a while, but I want to clarify that, if you trust your application code, you don't actually need a custom request handler to do this sort of authentication filtering. At indexing time, you can add a role field to each object that you index, as described in the thread. At query time, you could simply have your application code add an appropriate filter query to each Solr request. So, if you're using the standard XML query interface, instead of sending URLs like http://.../solr/select?q=foo... you can have your application code send URLs like http://.../solr/select?q=foofq=role:admin... If I understand the custom request handler approach, then it basically amounts to the same thing as the above; the only difference is that the filter query gets added internally by Solr, rather than at the application level. Sorry if you already understand all this; I'm throwing these comments out just in case. Cheers, Chris On Mon, Jan 12, 2009 at 1:54 AM, Manupriya manupriya.si...@gmail.com wrote: Hi, I am using DIH feature of Solr for indexing a database. I am using Solr server and it is independent of my web application. I send a http request for searching and then process the returned result. Now we have a requirement that we have to filter the results further based on security level restrictions? For example, user id abc should not be allowed to see a particular result. How could we achieve that? I followed,http://www.nabble.com/Restricted-views-of-an-index-td15088750.html#a15090791 It suggests something like - Add a role or access class to each indexed item, then use that in the queries, probably in a filter specified in a request handler. That keeps the definition of the filter within Solr. For example, you can create a request handler named admin, a field named role, and add a filter of role:admin. I could not follow this solution. Is there any example or resource that explains how to use custom request handler with filtering? Thanks, Manu -- View this message in context: http://www.nabble.com/Restricting-results-based-on-user-authentication-tp21411449p21411449.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Restricting-results-based-on-user-authentication-tp21411449p21429723.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Restricting results based on user authentication
On Mon, Jan 12, 2009 at 9:31 PM, Manupriya manupriya.si...@gmail.com wrote: Thanks Chris, I agree with your approach. I also dont want to add anything at the application level. I want authentication to be handled internally at the Solr level itself. The application layer needs to be involved somehow, right, because I assume the application level is the code that knows what the current user id is. I'm not clear exactly what you want to keep out of the application level. In any case, if you don't like the idea of the application layer adding a filter query, I think I'll defer to people with more expertise on what your options are. Can you please explain me little more about how to add a role field to each object at indexing time? Is there any resource/example available explaining this? You mentioned you're using the DataImportHandler. If your data source is a single SQL table, the easiest approach might be to add a role column to that table, and populate it appropriately for each object. (How to do this of course depends on your application.) If your data import code joins multiple tables, you'd need to think about which table would be most appropriate for storing the role data. Or perhaps your select statement could fill out a role based on testing values of other fields; in SQL Server anyway you can write something that looks more or less like this (the real syntax is slightly different): SELECT OrderID, Date, Company, CASE Company = 'CIA' THEN 'admin' ELSE 'user' END CASE as Role (The idea here is to require admin access to view orders from the CIA.) Thank, Manu ryguasu wrote: Hi Manu, I haven't made a custom request handler in a while, but I want to clarify that, if you trust your application code, you don't actually need a custom request handler to do this sort of authentication filtering. At indexing time, you can add a role field to each object that you index, as described in the thread. At query time, you could simply have your application code add an appropriate filter query to each Solr request. So, if you're using the standard XML query interface, instead of sending URLs like http://.../solr/select?q=foo... you can have your application code send URLs like http://.../solr/select?q=foofq=role:admin... If I understand the custom request handler approach, then it basically amounts to the same thing as the above; the only difference is that the filter query gets added internally by Solr, rather than at the application level. Sorry if you already understand all this; I'm throwing these comments out just in case. Cheers, Chris On Mon, Jan 12, 2009 at 1:54 AM, Manupriya manupriya.si...@gmail.com wrote: Hi, I am using DIH feature of Solr for indexing a database. I am using Solr server and it is independent of my web application. I send a http request for searching and then process the returned result. Now we have a requirement that we have to filter the results further based on security level restrictions? For example, user id abc should not be allowed to see a particular result. How could we achieve that? I followed,http://www.nabble.com/Restricted-views-of-an-index-td15088750.html#a15090791 It suggests something like - Add a role or access class to each indexed item, then use that in the queries, probably in a filter specified in a request handler. That keeps the definition of the filter within Solr. For example, you can create a request handler named admin, a field named role, and add a filter of role:admin. I could not follow this solution. Is there any example or resource that explains how to use custom request handler with filtering? Thanks, Manu