Re: Documents cannot be searched immediately when indexed using REST API with Solr Cloud
Hi Edvin Please review your commit/soft-commit configuration, soft commits are about visibility, hard commits are about durability by a wise man. :) If you are doing NRT index and searching, your probably need a short soft commit interval or commit explicitly in your request handler. Be advised that these strategies and configurations need to be tested and adjusted according to your data size, searching and index updating frequency. You should be able to find the answer yourself here: http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ All the best Liu Bo On 19 March 2015 at 17:54, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, I'm using Solr Cloud now, with 2 shards known as shard1 and shard2, and when I try to index rich-text documents using REST API or the default Documents module in Solr Admin UI, the documents that are indexed do not appear immediately when I do a search. It only appears after I restarted the Solr services (both shard1 and shard2). However, the same issue do not happen when I index the same documents using post.jar, and I can search for the indexed documents immediately. Here's my ExtractingRequestHandler in solrconfig.xml. requestHandler name=/update/extract class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=lowernamestrue/str str name=uprefixignored_/str !-- capture link hrefs but ignore div attributes -- str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str /lst /requestHandler What could be the reason why this is happening, and any solutions to solve it? Regards, Edwin
Re: Where to specify numShards when startup up a cloud setup
Hi zzT Putting numShards in core.properties also works. I struggled a little bit while figuring out this configuration approach. I knew I am not alone! ;-) On 2 April 2014 18:06, zzT zis@gmail.com wrote: It seems that I've figured out a configuration approach to this issue. I'm having the exact same issue and the only viable solutions found on the net till now are 1) Pass -DnumShards=x when starting up Solr server 2) Use the Collections API as indicated by Shawn. What I've noticed though - after making the call to /collections to create a node solr.xml - is that a new core entry is added inside solr.xml with the attribute numShards. So, right now I'm configuring solr.xml with numShards attribute inside my core nodes. This way I don't have to worry with annoying stuff you've already mentioned e.g. waiting for Solr to start up etc. Of course same logic applies here, numShards param is meanigful only the first time. Even if you change it at a later point the # of shards stays the same. -- View this message in context: http://lucene.472066.n3.nabble.com/Where-to-specify-numShards-when-startup-up-a-cloud-setup-tp4078473p4128566.html Sent from the Solr - User mailing list archive at Nabble.com. -- All the best Liu Bo
Re: Multiple Languages in Same Core
Hi Jeremy There're a lot of multi language discussions, two main approaches 1. like yours, a language is one core 2. all in one core, different language has it's own field. We have multi-language support in a single core, each multilingual field has it's own suffix such as name_en_US. We customized query handler to hide the query details to client. The main reason we want to do this is about NRT index and search, take product for example: product has price, quantity which is common and it's used by filtering and sorting, name, description is multi language field, if we split product in do different cores, the common field updating may end up a update in all of the multi language cores. As to scalability, we don't change solr cores/collections when a new language is added, but we probably need update our customized index process and run a full re-index. This approach suits our requirement for now, but you may have your own concerns. We have similar suggest filter problem like yours, we want to return suggest result filtering by stores. I can't find a way to build dictionary with query at my version of solr 4.6 What I do is run a query on a N-Gram analyzed field and with filter queries on store_id field. The suggest is actually a query. It may not perform as well as suggestion but can do the trick. You can try it to build a additional N-GRAM field for suggestion only and search on it with fq on your Locale field. All the best Liu Bo On 25 March 2014 09:15, Alexandre Rafalovitch arafa...@gmail.com wrote: Solr In Action has a significant discussion on the multi-lingual approach. They also have some code samples out there. Might be worth a look Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Mar 25, 2014 at 4:43 AM, Jeremy Thomerson jer...@thomersonfamily.com wrote: I recently deployed Solr to back the site search feature of a site I work on. The site itself is available in hundreds of languages. With the initial release of site search we have enabled the feature for ten of those languages. This is distributed across eight cores, with two Chinese languages plus Korean combined into one CJK core and each of the other seven languages in their own individual cores. The reason for splitting these into separate cores was so that we could have the same field names across all cores but have different configuration for analyzers, etc, per core. Now I have some questions on this approach. 1) Scalability: Considering I need to scale this to many dozens more languages, perhaps hundreds more, is there a better way so that I don't end up needing dozens or hundreds of cores? My initial plan was that many languages that didn't have special support within Solr would simply get lumped into a single default core that has some default analyzers that are applicable to the majority of languages. 1b) Related to this: is there a practical limit to the number of cores that can be run on one instance of Lucene? 2) Auto Suggest: In phase two I intend to add auto-suggestions as a user types a query. In reviewing how this is implemented and how the suggestion dictionary is built I have concerns. If I have more than one language in a single core (and I keep the same field name for suggestions on all languages within a core) then it seems that I could get suggestions from another language returned with a suggest query. Is there a way to build a separate dictionary for each language, but keep these languages within the same core? If it's helpful to know: I have a field in every core for Locale. Values will be the locale of the language of that document, i.e. en, es, zh_hans, etc. I'd like to be able to: 1) when building a suggestion dictionary, divide it into multiple dictionaries, grouping them by locale, and 2) supply a parameter to the suggest query that allows the suggest component to only return suggestions from the appropriate dictionary for that locale. If the answer to #1 is keep splitting groups of languages that have different analyzers into their own cores and the answer to #2 is that's not supported, then I'd be curious: where would I start to write my own extension that supported #2? I looked last night at the suggest lookup classes, dictionary classes, etc. But I didn't see a clear point where it would be clean to implement something like I'm suggesting above. Best Regards, Jeremy Thomerson -- All the best Liu Bo
Re: Grouping results with group.limit return wrong numFound ?
hi @Ahmet I've thought about using group.ngroups=true , but when you use group.main=true, there's no ngroups field in the response. and according to http://wiki.apache.org/solr/FieldCollapsing, the result might not be correct in solrcloud. I don't like using facet for this but seems have to... On 1 January 2014 00:35, Ahmet Arslan iori...@yahoo.com wrote: Hi Tasmaniski, I don't follow. How come Liu's faceting workaround and n.groups=true produce different results? On Tuesday, December 31, 2013 6:08 PM, tasmaniski tasmani...@gmail.com wrote: @kamaci Ofcourse. That is the problem. group.limit is: the number of results (documents) to return for each group. NumFound is number of total found, but *not* sum number of *return for each group.* @Liu Bo seems to be the is only workaround for problem but it's to much expensive to go through all the groups and calculate total number of found/returned (I use PHP for client:) ). @iorixxx Yes, I consider that (group.ngroups=true) but in some group I have number of found result lesser than limit. -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-results-with-group-limit-return-wrong-numFound-tp4108174p4108906.html Sent from the Solr - User mailing list archive at Nabble.com. -- All the best Liu Bo
Re: Chaining plugins
Hi I've done similar things as paul. what I do is extending the default QueryComponent and overwrite the preparing method, then I just change the solrparams according to our logic and then call super.prepare(). Then replace the default QueryComponent with it in my search/query handler. In this way, nothing of solr default behavior is touched. I think you can do your logic in prepare method, and then let solr proceed the search. I've tested it along with other components in both single solr node and solrcloud. It works fine. Hope it helps Cheers Bold On 31 December 2013 06:03, Chris Hostetter hossman_luc...@fucit.org wrote: You don't need to write your own handler. See the previpous comment about implementing a SearchComponent -- you can check for the params in your prepare() method and do whatever side effects you want, then register your custom component and hook it into the component chain of whatever handler configuration you want (either using the components arr or by specifying it as a first-components... https://cwiki.apache.org/confluence/display/solr/RequestHandlers+and+SearchComponents+in+SolrConfig : I want to save the query into a file when a user is changing a parameter in : the query, lets say he adds logTofile=1 then the searchHandler will : provide the same result as without this parameter, but in the background it : will do some logic(ex. save the query to file) . : But I dont want to touch solr source code, all I want is to add code(like : plugin). if i understand it right I want to write my own search handler , do : some logic , then pass the data to solr default search handler. -Hoss http://www.lucidworks.com/ -- All the best Liu Bo
Re: Grouping results with group.limit return wrong numFound ?
Hi I've met the same problem, and I've googled it around but not found direct solution. But there's a work around, do a facet on your group field, with parameters like str name=facettrue/str str name=facet.fieldyour_field/str str name=facet.limit-1/str str name=facet.mincount1/str and then count how many facted pairs in the response. This should be the same with the number of documents after grouping. Cheers Bold On 31 December 2013 06:40, Furkan KAMACI furkankam...@gmail.com wrote: Hi; group.limit is: the number of results (documents) to return for each group. Defaults to 1. Did you check the page here: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604232 Thanks; Furkan KAMACI 25 Aralık 2013 Çarşamba tarihinde tasmaniski tasmani...@gmail.com adlı kullanıcı şöyle yazdı: Hi All, When I perform a search with grouping result in a groups and do limit results in one group I got that *numFound* is the same as I didn't use limit.looks like SOLR first perform search and calculate numFound and that group and limit the results.I do not know if this is a bug or a feature :)But I cannot use pagination and other stuff.Is there any workaround or I missed something ?Example:I want to search book title and limit the search to 3 results per one publisher.q=book_title: solr phpgroup=truegroup.field=publishergroup.limit=3group.main=trueI have for apress publisher 20 results but I show only 3 that works OKBut in numFound I still have 20 for apress publisher... -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-results-with-group-limit-return-wrong-numFound-tp4108174.html Sent from the Solr - User mailing list archive at Nabble.com. -- All the best Liu Bo
Re: PostingsSolrHighlighter
hi Josip for the 1 question we've done similar things: copying search field to a text field. But highlighting is normally on specific fields such as tittle depending on how the search content is displayed to the front end, you can search on text and highlight on the field you wanted by specify hl.fl ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl On 17 December 2013 02:29, Josip Delic j...@lugensa.com wrote: Hi @all, i am playing with the PostingsSolrHighlighter. I'm running solr 4.6.0 and my configuration is from here: https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/highlight/ PostingsSolrHighlighter.html Search query and result (not working): http://pastebin.com/13Uan0ZF Schema (not complete): http://pastebin.com/JGa38UDT Search query and result (working): http://pastebin.com/4CP8XKnr Solr config: searchComponent class=solr.HighlightComponent name=highlight highlighting class=org.apache.solr.highlight.PostingsSolrHighlighter/ /searchComponent So this is working just fine, but now i have some questions: 1.) With the old default highlighter component it was possible to search in searchable_text and to retrive highlighted text. This is essential, because we use copyfield to put almost everything to searchable_text (title, subtitle, description, ...) 2.) I can't get ellipsis working i tried hl.tag.ellipsis=..., f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems to work, maxAnalyzedChars is just cutting the sentence? Kind Regards Josip Delic -- All the best Liu Bo
Re: an array liked string is treated as multivalued when adding doc to solr
Hi Alexandre It's quite a rare case, just one out of tens of thousands. I'm planning to have every multilingual field as multivalued and just get the first one while formatting the response to our business object. The first value update processor seems a lot helpful, thank you. All the best Liu Bo On 18 December 2013 15:26, Alexandre Rafalovitch arafa...@gmail.com wrote: If this happens rarely and you want to deal with in on the way into Solr, you could just keep one of the values, using URP: http://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/update/processor/FirstFieldValueUpdateProcessorFactory.html Regards, Alex Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Dec 18, 2013 at 2:20 PM, Liu Bo diabl...@gmail.com wrote: Hey Furkan and solr users This is a miss reported problem. It's not solr problem but our data issue. Sorry for this. It's a data issue of our side, a coupon happened to have two piece English description, which is not allowed in our business logic, but it happened and we added twice of the name_en_US to solr document. I've done a set of test and deep debugging to solr source code, and found out that a array like string such as [Get 20% Off Official Barca Kits, coupon] won't be treated as multivalued field. Sorry again for not digging more before sent out question email. I trust our business logic and data integrity more than solr, I will definitely not do this again. ;-) All the best Liu Bo On 11 December 2013 07:21, Furkan KAMACI furkankam...@gmail.com wrote: Hi Liu; Yes. it is an expected behavior. If you send data within square brackets Solr will behave it as a multivalued field. You can test it with this way: if you use Solrj and use a List for a field it will be considered as multivalued too because when you call toString() method of your List you can see that elements are printed within square brackets. This is the reason that a List can be used for a multivalued field. If you explain your situation I can offer a way how to do it. Thanks; Furkan KAMACI 2013/12/6 Liu Bo diabl...@gmail.com Dear solr users: I've met this kind of error several times, when add a array liked string such as:[Get 20% Off Official Barça Kits, coupon] to a multiValued=false field, solr will complain: org.apache.solr.common.SolrException: ERROR: [doc=7781396456243918692] multiple values encountered for non multiValued field name_en_US: [Get 20% Off Official Barca Kits, coupon] my schema defination: field name=name_en_US type=text_en indexed=true stored=true multiValued=false / This field is stored as the search result needs this field and it's value in original format, and indexed to give it a boost while searching . What I do is adding name (java.lang.String) to SolrInputDocument by addField(name_en_US, product.getName()) method, and then add this to solr using an AddUpdateCommand It seems solr treats this kind of string data as multivalued, even I add this field to solr only once. Is this a bug or a supposed behavior? Is there any way to tell solr this is not a multivalued value add don't break it? Your help and suggestion will be much of my appreciation. -- All the best Liu Bo -- All the best Liu Bo -- All the best Liu Bo
Re: PostingsSolrHighlighter
Hi Josip that's quite weird, to my experience highlight is strict on string field which needs a exact match, text fields should be fine. I copy your schema definition and do a quick test in a new core, everything is default from the tutorial, and the search component is using solr.HighlightComponent . search on searchable_text can highlight text, I copied your search url and just change the host part, the input parameters are exactly the same, result is attached. Can you upload your complete solrconfig.xml and schema.xml? On 18 December 2013 19:02, Josip Delic j...@lugensa.com wrote: Am 18.12.2013 09:55, schrieb Liu Bo: hi Josip hi liu, for the 1 question we've done similar things: copying search field to a text field. But highlighting is normally on specific fields such as tittle depending on how the search content is displayed to the front end, you can search on text and highlight on the field you wanted by specify hl.fl ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl thats exactly what i'm doing in that pastebin: http://pastebin.com/13Uan0ZF I'm searing there for 'q=searchable_text:labore' this is present in 'text' and in the copyfield 'searchable_text' but it is not highlighted in 'text' (hl.fl=text) The same query is working if set 'q=text:labore' as you can see in http://pastebin.com/4CP8XKnr For 2 question i figured out that the PostingsSolrHighlighter ellipsis is not like i thought for adding ellipsis to start or/and end in highlighted text. It is instead used to combine multiple snippets together if snippets is 1. cheers josip On 17 December 2013 02:29, Josip Delic j...@lugensa.com wrote: Hi @all, i am playing with the PostingsSolrHighlighter. I'm running solr 4.6.0 and my configuration is from here: https://lucene.apache.org/solr/4_6_0/solr-core/org/ apache/solr/highlight/ PostingsSolrHighlighter.html Search query and result (not working): http://pastebin.com/13Uan0ZF Schema (not complete): http://pastebin.com/JGa38UDT Search query and result (working): http://pastebin.com/4CP8XKnr Solr config: searchComponent class=solr.HighlightComponent name=highlight highlighting class=org.apache.solr.highlight. PostingsSolrHighlighter/ /searchComponent So this is working just fine, but now i have some questions: 1.) With the old default highlighter component it was possible to search in searchable_text and to retrive highlighted text. This is essential, because we use copyfield to put almost everything to searchable_text (title, subtitle, description, ...) 2.) I can't get ellipsis working i tried hl.tag.ellipsis=..., f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems to work, maxAnalyzedChars is just cutting the sentence? Kind Regards Josip Delic -- All the best Liu Bo http://localhost:8080/solr/try/select?wt=jsonfl=text%2Cscore=hl=truehl.fl=textq=%28searchable_text%3Alabore%29rows=10sort=score+descstart=0 { responseHeader: { status: 0, QTime: 36, params: { sort: score desc, fl: text, start: 0, ,score: , q: (searchable_text:labore), hl.fl: text, wt: json, hl: true, rows: 10 } }, response: { numFound: 3, start: 0, docs: [ { text: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. }, { text: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. }, { text: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata
Re: an array liked string is treated as multivalued when adding doc to solr
Hey Furkan and solr users This is a miss reported problem. It's not solr problem but our data issue. Sorry for this. It's a data issue of our side, a coupon happened to have two piece English description, which is not allowed in our business logic, but it happened and we added twice of the name_en_US to solr document. I've done a set of test and deep debugging to solr source code, and found out that a array like string such as [Get 20% Off Official Barca Kits, coupon] won't be treated as multivalued field. Sorry again for not digging more before sent out question email. I trust our business logic and data integrity more than solr, I will definitely not do this again. ;-) All the best Liu Bo On 11 December 2013 07:21, Furkan KAMACI furkankam...@gmail.com wrote: Hi Liu; Yes. it is an expected behavior. If you send data within square brackets Solr will behave it as a multivalued field. You can test it with this way: if you use Solrj and use a List for a field it will be considered as multivalued too because when you call toString() method of your List you can see that elements are printed within square brackets. This is the reason that a List can be used for a multivalued field. If you explain your situation I can offer a way how to do it. Thanks; Furkan KAMACI 2013/12/6 Liu Bo diabl...@gmail.com Dear solr users: I've met this kind of error several times, when add a array liked string such as:[Get 20% Off Official Barça Kits, coupon] to a multiValued=false field, solr will complain: org.apache.solr.common.SolrException: ERROR: [doc=7781396456243918692] multiple values encountered for non multiValued field name_en_US: [Get 20% Off Official Barca Kits, coupon] my schema defination: field name=name_en_US type=text_en indexed=true stored=true multiValued=false / This field is stored as the search result needs this field and it's value in original format, and indexed to give it a boost while searching . What I do is adding name (java.lang.String) to SolrInputDocument by addField(name_en_US, product.getName()) method, and then add this to solr using an AddUpdateCommand It seems solr treats this kind of string data as multivalued, even I add this field to solr only once. Is this a bug or a supposed behavior? Is there any way to tell solr this is not a multivalued value add don't break it? Your help and suggestion will be much of my appreciation. -- All the best Liu Bo -- All the best Liu Bo
an array liked string is treated as multivalued when adding doc to solr
Dear solr users: I've met this kind of error several times, when add a array liked string such as:[Get 20% Off Official Barça Kits, coupon] to a multiValued=false field, solr will complain: org.apache.solr.common.SolrException: ERROR: [doc=7781396456243918692] multiple values encountered for non multiValued field name_en_US: [Get 20% Off Official Barca Kits, coupon] my schema defination: field name=name_en_US type=text_en indexed=true stored=true multiValued=false / This field is stored as the search result needs this field and it's value in original format, and indexed to give it a boost while searching . What I do is adding name (java.lang.String) to SolrInputDocument by addField(name_en_US, product.getName()) method, and then add this to solr using an AddUpdateCommand It seems solr treats this kind of string data as multivalued, even I add this field to solr only once. Is this a bug or a supposed behavior? Is there any way to tell solr this is not a multivalued value add don't break it? Your help and suggestion will be much of my appreciation. -- All the best Liu Bo
Re: deleting a doc inside a custom UpdateRequestProcessor
hi, you can try this in your checkIfIsDuplicate(), build a query based on your title, and set it to a delete command: //build your query accordingly, this depends on how your tittle is indexed, eg analyzed or not. be careful with it and do some test. DeleteUpdateCommand cmd = new DeleteUpdateCommand(req); cmd.commitWithin = commitWithin; cmd.setQuery(query); processDelete(cmd); Processors are normally chained, you should make sure that your processor comes the first so that it can control what's coming next based on your logic. you can also try to write your own updaterequesthandler instead of a customized processor. you can do a set of operations in your function @Override public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {} get your processor chain in this function and passes a delete command to it such as : SolrParams params = req.getParams(); checkParameter(params); UpdateRequestProcessorChain processorChain = req.getCore().getUpdateProcessingChain(params.get(UpdateParams.UPDATE_CHAIN)); UpdateRequestProcessor processor = processorChain.createProcessor(req, rsp); DeleteUpdateCommand cmd = new DeleteUpdateCommand(req); cmd.commitWithin = commitWithin; cmd.setQuery(query); processor.processDelete(cmd); this is what I am doing when customizing a update request handler, I try not to touch the original process chain but tell solr what to do by commands. On 19 November 2013 10:01, Peyman Faratin pey...@robustlinks.com wrote: Hi I am building a custom UpdateRequestProcessor to intercept any doc heading to the index. Basically what I want to do is to check if the current index has a doc with the same title (i am using IDs as the uniques so I can't use that, and besides the logic of checking is a little more complicated). If the incoming doc has a duplicate and some other conditions hold then one of 2 things can happen: 1- we don't index the incoming document 2- we index the incoming and delete the duplicate currently in the index I think (1) can be done by simple not passing the call up the chain (not calling super.processAdd(cmd)). However, I don't know how to implement the second condition, deleting the duplicate document, inside a custom UpdateRequestProcessor. This thread is the closest to my goal http://lucene.472066.n3.nabble.com/SOLR-4-3-0-Migration-How-to-use-DeleteUpdateCommand-td4062454.html however i am not clear how to proceed. Code snippets below. thank you in advance for your help class isDuplicate extends UpdateRequestProcessor { public isDuplicate( UpdateRequestProcessor next) { super( next ); } @Override public void processAdd(AddUpdateCommand cmd) throws IOException { try { boolean indexIncomingDoc = checkIfIsDuplicate(cmd); if(indexIncomingDoc) super.processAdd(cmd); } catch (SolrServerException e) {e.printStackTrace();} catch (ParseException e) {e.printStackTrace();} } public boolean checkIfIsDuplicate(AddUpdateCommand cmd) ...{ SolrInputDocument incomingDoc = cmd.getSolrInputDocument(); if(incomingDoc == null) return false; String title = (String) incomingDoc.getFieldValue( title ); SolrIndexSearcher searcher = cmd.getReq().getSearcher(); boolean addIncomingDoc = true; Integer idOfDuplicate = searcher.getFirstMatch(new Term(title,title)); if(idOfDuplicate != -1) { addIncomingDoc = compareDocs(searcher,incomingDoc,idOfDuplicate,title,addIncomingDoc); } return addIncomingDoc; } private boolean compareDocs(.){ if( condition 1 ) { -- DELETE DUPLICATE DOC in INDEX -- addIncomingDoc = true; } return addIncomingDoc; } -- All the best Liu Bo
Re: Multi-core support for indexing multiple servers
As far as I know about magento, it's DB schema is designed for extensible property storage and relationships between db tables are kind of complex. Product has its attribute sets and properties which are stored in different tables. Configurable product may have different attribute values for each of it's sub simple products. Handle relationship like this in DIH won't be easy, especially when you want to group attributes of a configurable product into one document. But if you just need to search on name and description but not other attributes, you can try write DIH on catalog_product_flat_x tables, magento may have several of them. We used to use lucene core to provide search on magento products, what we do is using SOAP service provided by magento to get products, and then converting them to lucene document. Indexes are updated daily. This hides lots of magento implementation details but it's kind of slow. On 12 November 2013 22:41, Robert Veliz rob...@mavenbridge.com wrote: I have two sources/servers--one of them is Magento. Since Magento has a more or less out of the box integration with Solr, my thought was to run Solr server from the Magento instance and then use DIH to get/merge content from the other source/server. Seem feasible/appropriate? I spec'd it out and it seems to make sense... R On Nov 11, 2013, at 11:25 PM, Liu Bo diabl...@gmail.com wrote: like Erick said, merge data from different datasource could be very difficult, SolrJ is much easier to use but may need another application to do handle index process if you don't want to extends solr much. I eventually end up with a customized request handler which use SolrWriter from DIH package to index data, So that I can fully control the index process, quite like SolrJ, you can write code to convert your data into SolrInputDocument, and then post them to SolrWriter, SolrWriter will handles the rest stuff. On 8 November 2013 21:46, Erick Erickson erickerick...@gmail.com wrote: Yep, you can define multiple data sources for use with DIH. Combining data from those multiple sources into a single index can be a bit tricky with DIH, personally I tend to prefer SolrJ, but that's mostly personal preference, especially if I want to get some parallelism going on. But whatever works Erick On Thu, Nov 7, 2013 at 11:17 PM, manju16832003 manju16832...@gmail.com wrote: Eric, Just a question :-), wouldn't it be easy to use DIH to pull data from multiple data sources. I do use DIH to do that comfortably. I have three data sources - MySQL - URLDataSource that returns XML from an .NET application - URLDataSource that connects to an API and return XML Here is part of data-config data source settings dataSource type=JdbcDataSource name=solr driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/employeeDB batchSize=-1 user=root password=root/ dataSource name=CRMServer type=URLDataSource encoding=UTF-8 connectionTimeout=5000 readTimeout=1/ dataSource name=ImageServer type=URLDataSource encoding=UTF-8 connectionTimeout=5000 readTimeout=1/ Of course, in application I do the same. To construct my results, I do connect to MySQL and those two data sources. Basically we have two point of indexing - Using DIH at one time indexing - At application whenever there is transaction to the details that we are storing in Solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html Sent from the Solr - User mailing list archive at Nabble.com. -- All the best Liu Bo -- All the best Liu Bo
Re: eDisMax, multiple language support and stopwords
Happy to see some one have similar solutions as ours. we have similar multi-language search feature and we index different language content to _fr, _en field like you've done but in search, we need a language code as a parameter to specify the language client wants to search on which is normally decided by the website visited, such as: qf=name descriptionlanguage=en and in our search components we find the right field: name_en and description_en to be searched on we used to support on all language search and removed that later, as the site tells the customer which language is supported, we also don't think we have many language experts on our web sites that knows more than two language and need to search them at the same time. On 7 November 2013 23:01, Tom Mortimer tom.m.f...@gmail.com wrote: Ah, thanks Markus. I think I'll just add the Boolean operators to the stopwords list in that case. Tom On 7 November 2013 12:01, Markus Jelsma markus.jel...@openindex.io wrote: This is an ancient problem. The issue here is your mm-parameter, it gets confused because for separate fields different amount of tokens are filtered/emitted so it is never going to work just like this. The easiest option is not to use the stopfilter. http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-td493483.html https://issues.apache.org/jira/browse/SOLR-3085 -Original message- From:Tom Mortimer tom.m.f...@gmail.com Sent: Thursday 7th November 2013 12:50 To: solr-user@lucene.apache.org Subject: eDisMax, multiple language support and stopwords Hi all, Thanks for the help and advice I've got here so far! Another question - I want to support stopwords at search time, so that e.g. the query oscar and wilde is equivalent to oscar wilde (this is with lowercaseOperators=false). Fair enough, I have stopword and in the query analyser chain. However, I also need to support French as well as English, so I've got _en and _fr versions of the text fields, with appropriate stemming and stopwords. I index French content into the _fr fields and English into the _en fields. I'm searching with eDisMax over both versions, e.g.: str name=qfheadline_en headline_fr/str However, this means I get no results for oscar and wilde. The parsed query is: (+((DisjunctionMaxQuery((headline_fr:osca | headline_en:oscar)) DisjunctionMaxQuery((headline_fr:and)) DisjunctionMaxQuery((headline_fr:wild | headline_en:wild)))~3))/no_coord If I add and to the French stopwords list, I *do* get results, and the parsed query is: (+((DisjunctionMaxQuery((headline_fr:osca | headline_en:oscar)) DisjunctionMaxQuery((headline_fr:wild | headline_en:wild)))~2))/no_coord This implies that the only solution is to have a minimal, shared stopwords list for all languages I want to support. Is this correct, or is there a way of supporting this kind of searching with per-language stopword lists? Thanks for any ideas! Tom -- All the best Liu Bo
Re: Multi-core support for indexing multiple servers
like Erick said, merge data from different datasource could be very difficult, SolrJ is much easier to use but may need another application to do handle index process if you don't want to extends solr much. I eventually end up with a customized request handler which use SolrWriter from DIH package to index data, So that I can fully control the index process, quite like SolrJ, you can write code to convert your data into SolrInputDocument, and then post them to SolrWriter, SolrWriter will handles the rest stuff. On 8 November 2013 21:46, Erick Erickson erickerick...@gmail.com wrote: Yep, you can define multiple data sources for use with DIH. Combining data from those multiple sources into a single index can be a bit tricky with DIH, personally I tend to prefer SolrJ, but that's mostly personal preference, especially if I want to get some parallelism going on. But whatever works Erick On Thu, Nov 7, 2013 at 11:17 PM, manju16832003 manju16832...@gmail.com wrote: Eric, Just a question :-), wouldn't it be easy to use DIH to pull data from multiple data sources. I do use DIH to do that comfortably. I have three data sources - MySQL - URLDataSource that returns XML from an .NET application - URLDataSource that connects to an API and return XML Here is part of data-config data source settings dataSource type=JdbcDataSource name=solr driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/employeeDB batchSize=-1 user=root password=root/ dataSource name=CRMServer type=URLDataSource encoding=UTF-8 connectionTimeout=5000 readTimeout=1/ dataSource name=ImageServer type=URLDataSource encoding=UTF-8 connectionTimeout=5000 readTimeout=1/ Of course, in application I do the same. To construct my results, I do connect to MySQL and those two data sources. Basically we have two point of indexing - Using DIH at one time indexing - At application whenever there is transaction to the details that we are storing in Solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html Sent from the Solr - User mailing list archive at Nabble.com. -- All the best Liu Bo
how does solr load plugins?
Hi I write a plugin to index contents reusing our DAO layer which is developed using Spring. What I am doing now is putting the plugin jar and all other depending jars of DAO layer to shared lib folder under solr home. In the log, I can see all the jars are loaded through SolrResourceLoader like: INFO - 2013-10-16 16:25:30.611; org.apache.solr.core.SolrResourceLoader; Adding 'file:/D:/apache-tomcat-7.0.42/solr/lib/spring-tx-3.1.0.RELEASE.jar' to classloader Then initialize the Spring context using: ApplicationContext context = new FileSystemXmlApplicationContext(/solr/spring/solr-plugin-bean-test.xml); Then Spring will complain: INFO - 2013-10-16 16:33:57.432; org.springframework.context.support.AbstractApplicationContext; Refreshing org.springframework.context.support.FileSystemXmlApplicationContext@e582a85: startup date [Wed Oct 16 16:33:57 CST 2013]; root of context hierarchy INFO - 2013-10-16 16:33:57.491; org.springframework.beans.factory.xml.XmlBeanDefinitionReader; Loading XML bean definitions from file [D:\apache-tomcat-7.0.42\solr\spring\solr-plugin-bean-test.xml] ERROR - 2013-10-16 16:33:59.944; com.test.search.solr.spring.AppicationContextWrapper; Configuration problem: Unable to locate Spring NamespaceHandler for XML schema namespace [ http://www.springframework.org/schema/context] Offending resource: file [D:\apache-tomcat-7.0.42\solr\spring\solr-plugin-bean-test.xml] Spring context requires spring-tx-3.1.xsd which does exist in spring-tx-3.1.0.RELEASE.jar under org\springframework\transaction\config\ package, but the program can't find it even though it could load spring classes successfully. The following won't work either. ApplicationContext context = new ClassPathXmlApplicationContext(classpath:spring/solr-plugin-bean-test.xml); //the solr-plugin-bean-test.xml is packaged in plugin.jar as well. But when I but all the jars under TOMECAT_HOME/webapp/solr/WEB-INF/lib, and using ApplicationContext context = new ClassPathXmlApplicationContext(classpath:spring/solr-plugin-bean-test.xml); everything works fine, I could initialize spring context and load DAO beans to read data and then write them to solr index. But isn't modifying solr.war a bad practice? It seems SolrResourceLoader only loads classes from plugins jars but these jars are NOT in classpath. Please correct me if I am wrong, Is there any ways to use resources in plugin jars such as configuration file? BTW is there any difference between SolrResourceLoader with tomcat webapp classLoader? -- All the best Liu Bo
Re: SolrDocumentList - bitwise operation
join query might be helpful: http://wiki.apache.org/solr/Join join can across indexes but probably won't work in solr clound. be aware that only to documents are retrievable, if you want content from both documents, join query won't work. And in lucene join query doesn't quite work on multiple join conditions, haven't test it in solr yet. I have similar join case like you, eventually I choose to denormalize our data into one set of documents. On 13 October 2013 22:34, Michael Tyler michaeltyler1...@gmail.com wrote: Hello, I have 2 different solr indexes returning 2 different sets of SolrDocumentList. Doc Id is the foreign key relation. After obtaining them, I want to perform AND operation between them and then return results to user. Can you tell me how do I get this? I am using solr 4.3 SolrDocumentList results1 = responseA.getResults(); SolrDocumentList results2 = responseB.getResults(); results1 : d1, d2, d3 results2 : d1,d2, d4 Return : d1, d2 Regards, Michael -- All the best Liu Bo
Re: SolrCore 'collection1' is not available due to init failure
org.apache.solr.core.SolrCore.init(SolrCore.java:821) ... 13 more Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/usr/share/solr-4.5.0/example/solr/ collection1/data/index/write.lock: java.io.FileNotFoundException: /usr/share/solr-4.5.0/example/solr/collection1/data/index/write.lock (Permission denied) at org.apache.lucene.store.Lock.obtain(Lock.java:84) at it seems a permission problem, the user that start tomcat don't have permission to access your index folder. try grant read and write permission to current user to your solr data folder and restart tomcat to see what happens. -- All the best Liu Bo
Re: Multiple schemas in the same SolrCloud ?
you can try this way: start zookeeper server first. upload your configurations to zookeeper and link them to your collection using zkcli just like shawn said let's say you have conf1 and conf2, you can link them to collection1 and collection2 remove the bootstrap stuff and start solr server. after you have solr running, create collection1 and collection2 via core admin, you don't have conf because all your core specified configurations are in zookeeper or you could use core discovery and have collection name specified in core.properties, see : http://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29 On 10 October 2013 23:57, maephisto my_sky...@yahoo.com wrote: On this topic, once you've uploaded you collection's configuration in ZK, how can you update it? Upload the new one with the same config name ? -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094729.html Sent from the Solr - User mailing list archive at Nabble.com. -- All the best Liu Bo
Re: documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json
I've solved this problem myself. If you use core discovery, you must specify the numShards parameter in core.properties. or else solr won't be allocate range for each shards and then documents won't be distributed properly. Using core discovery to set up solr cloud in tomcat is much easier and clean than coreAdmin described in the wiki: http://wiki.apache.org/solr/SolrCloudTomcat. It costs me some time to move from jetty to tomcat, but I think our IT team will like this way. :) On 6 October 2013 23:53, Liu Bo diabl...@gmail.com wrote: Hi all I've sent out this mail before, but I only subscribed to lucene-user but not solr-user at that time. Sorry for repeating if any and your help will be much of my appreciation. I'm trying out the tutorial about solrcloud, and then I manage to write my own plugin to import data from our set of databases, I use SolrWriter from DataImporter package and the docs could be distributed commit to shards. Every thing works fine using jetty from the solr example, but when I move to tomcat, solrcloud seems not been configured right. As the documents are just committed to the shard where update requested goes to. The cause probably is the range is null for shards in clusterstate.json. The router is implicit instead of compositeId as well. Is there anything missed or configured wrong in the following steps? How could I fix it. Your help will be much of my appreciation. PS, solr cloud tomcat wiki page isn't up to 4.4 with core discovery, I'm trying out after reading SoclrCloud, SolrCloudJboss, and CoreAdmin wiki pages. Here's what I've done and some useful logs: 1. start three zookeeper server. 2. upload configuration files to zookeeper, the collection name is content_collection 3. start three tomcat instants on three server with core discovery a) core file: name=content loadOnStartup=true transient=false shard=shard1 (differrent on servers) collection=content_collection b) solr.xml solr solrcloud str name=host${host:}/str str name=hostContext${hostContext:solr}/str int name=hostPort8080/int int name=zkClientTimeout${zkClientTimeout:15000}/int str name=zkHost10.199.46.176:2181,10.199.46.165:2181, 10.199.46.158:2181/str bool name=genericCoreNodeNames${genericCoreNodeNames:true}/bool /solrcloud shardHandlerFactory name=shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeout${socketTimeout:0}/int int name=connTimeout${connTimeout:0}/int /shardHandlerFactory /solr 4. In the solr.log, I see the three shards are recognized, and the solrcloud can see the content_collection has three shards as well. 5. write documents to content_collection using my update request, the documents only commits to the shard the request goes to, in the log I can see the DistributedUpdateProcessorFactory is in the processorChain and disribute commit is triggered: INFO - 2013-09-30 16:31:43.205; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; updata request processor factories: INFO - 2013-09-30 16:31:43.206; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; org.apache.solr.update.processor.LogUpdateProcessorFactory@4ae7b77 INFO - 2013-09-30 16:31:43.207; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; org.apache.solr.update.processor.*DistributedUpdateProcessorFactory* @5b2bc407 INFO - 2013-09-30 16:31:43.207; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; org.apache.solr.update.processor.RunUpdateProcessorFactory@1652d654 INFO - 2013-09-30 16:31:43.283; org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onInit: commits: num=1 commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1} INFO - 2013-09-30 16:31:43.284; org.apache.solr.core.SolrDeletionPolicy; newest commit generation = 1 INFO - 2013-09-30 16:31:43.440; *org.apache.solr.update.SolrCmdDistributor; Distrib commit to*:[StdNode: http://10.199.46.176:8080/solr/content/, StdNode: http://10.199.46.165:8080/solr/content/] params:commit_end_point=truecommit=truesoftCommit=falsewaitSearcher=trueexpungeDeletes=false but the documents won't go to other shards, the other shards only has a request with not documents: INFO - 2013-09-30 16:31:43.841; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} INFO - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onInit: commits: num=1 commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1} INFO - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy; newest commit
documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json
@3c74c144main{StandardDirectoryReader(segments_1:1:nrt)} INFO - 2013-09-30 16:31:43.870; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush INFO - 2013-09-30 16:31:43.870; org.apache.solr.update.processor.LogUpdateProcessor; [content] webapp=/solr path=/update params={waitSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false} {commit=} 0 42 6) later I found the range is null in clusterstate.json which might have caused the document isn't committed distributively {content_collection:{ shards:{ shard1:{ * range:null,* state:active, replicas:{core_node1:{ state:active, core:content, node_name:10.199.46.176:8080_solr, base_url:http://10.199.46.176:8080/solr;, leader:true}}}, shard3:{ * range:null,* state:active, replicas:{core_node2:{ state:active, core:content, node_name:10.199.46.202:8080_solr, base_url:http://10.199.46.202:8080/solr;, leader:true}}}, shard2:{ * range:null,* state:active, replicas:{core_node3:{ state:active, core:content, node_name:10.199.46.165:8080_solr, base_url:http://10.199.46.165:8080/solr;, leader:true, *router:implicit*}} -- All the best Liu Bo
documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json
:31:43.870; org.apache.solr.update.processor.LogUpdateProcessor; [content] webapp=/solr path=/update params={waitSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false} {commit=} 0 42 6) later I found the range is null in clusterstate.json which might have caused the document isn't committed distributively {content_collection:{ shards:{ shard1:{ * range:null,* state:active, replicas:{core_node1:{ state:active, core:content, node_name:10.199.46.176:8080_solr, base_url:http://10.199.46.176:8080/solr;, leader:true}}}, shard3:{ * range:null,* state:active, replicas:{core_node2:{ state:active, core:content, node_name:10.199.46.202:8080_solr, base_url:http://10.199.46.202:8080/solr;, leader:true}}}, shard2:{ * range:null,* state:active, replicas:{core_node3:{ state:active, core:content, node_name:10.199.46.165:8080_solr, base_url:http://10.199.46.165:8080/solr;, leader:true, *router:implicit*}} -- All the best Liu Bo
how can I use DataImportHandler on multiple MySQL databases with the same schema?
Hi all Our system has distributed MySQL databases, we create a database for every customer signed up and distributed it to one of our MySQL hosts. We currently use lucene core to perform search on these databases, and we write java code to loop through these databases and convert the data to lucene index. Right now we are planning to move to Solr for distribution, and I am doing investigation on it. I tried to use DataImportHandlerhttp://wiki.apache.org/solr/DataImportHandler in the wiki page, but I can't figured out a way to use multiple datasoures with the same schema. The other question is, we have the database connection data in one table, can I create datasource connections info from it, and loop through the databases using DataImporter? If DataImporter isn't working, is there a way to feed data to solr using customized SolrRequestHandler without using SolrJ? If neither of these two ways is working, I think I am going to reuse the DAO of the old project and feed the data to solr using SolrJ, probably using embedded Solr server. Your help will be much of my appreciation. http://wiki.apache.org/solr/DataImportHandlerFaq-- All the best Liu Bo