Re: Updating data
If I understand it right, you want the json to only the new fields and not the field that has already been indexed/stored. Check out Solr Atomic updates. Below are some links which might help. http://wiki.apache.org/solr/Atomic_Updates http://yonik.com/solr/atomic-updates/ Remember, it requires the fields to be stored. On Tue, Feb 5, 2013 at 12:35 PM, anurag.jain anurag.k...@gmail.com wrote: i already indexing 180 data in solr index. all files were in json format. so data was like - [ { id:1, first_name:anurag, last_name:jain, ... }, { id:2, first_name:abhishek, last_name:jain, ... }, ... ] now i have to add a field in data like [ { id:1, first_name:anurag, last_name:jain, new_field:xvz, ... }, { id:2, first_name:abhishek, last_name:jain, new_field:xvz, ... }, ... ] but i want that : my json file like that [ { id:1, new_field:xvz }, { id:2, new_field:xvz } ] so it automatically update in solr like this file is doing. [ { id:1, first_name:anurag, last_name:jain, new_field:xvz, ... }, { id:2, first_name:abhishek, last_name:jain, new_field:xvz, ... }, ... ] any solutions ? please reply -- View this message in context: http://lucene.472066.n3.nabble.com/Updating-data-tp4038492.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tokenized keywords
Tokenizer changes the behavior of how you search/index and not how you store. What i understand is you want to display tokenized result always and not just for debug purpose. debugQuery has performance implications that should not be used for what you are trying to achieve. Basically, what you need is a way to store filtered and lowercased tokens in the 'modified' field. What I see as a solution is either you ingest 'original' field with your desired tokens directly instead of using copyfield or write some custom code to store/index only the filtered and lowercased result eg. custom transformer can be explored if you are using data import handler. On Mon, Jan 21, 2013 at 1:47 PM, Romita Saha romita.s...@sg.panasonic.comwrote: Hi, I have a field defined in scheme.xml named as 'original'. I first copy this field to modified and apply filters on this field modified. field name=original type=string indexed=true stored=true/ field name=modified type=text_general indexed=true stored=true/ copyField source=original dest=modified/ I want to display in my responseas follows: original: Search for all the Laptops modified: search laptop Thanks and regards, Romita Saha Panasonic RD Center Singapore Blk 1022 Tai Seng Avenue #06-3530 Tai Seng Ind. Est. Singapore 534415 DID: (65) 6550 5383 FAX: (65) 6550 5459 email: romita.s...@sg.panasonic.com From: Mikhail Khludnev mkhlud...@griddynamics.com To: solr-user@lucene.apache.org, Date: 01/21/2013 03:48 PM Subject:Re: Tokenized keywords Romita, That's what exactly is shown debugQuery output. If you cant find it there, paste output here, let's try to find together. Also pay attention to explainOther debug parameter and analisys page in admin ui. 21.01.2013 10:50 пользователь Romita Saha romita.s...@sg.panasonic.com написал: What I am trying to achieve is as follows. I query Search for all the Laptops and my tokenized key words are search laptop (I apply stopword filter to filter out words like for,all,the and i also user lowercase filter). I want to display these tokenized keywords using debugQuery. Thanks and regards, Romita From: Dikchant Sahi contacts...@gmail.com To: solr-user@lucene.apache.org, Date: 01/21/2013 02:26 PM Subject:Re: Tokenized keywords Can you please elaborate a more on what you are trying to achieve. Tokenizers work on indexed field and doesn't effect how the values will be displayed. The response value comes from stored field. If you want to see how your query is being tokenized, you can do it using analysis interface or enable debugQuery to see how your query is being formed. On Mon, Jan 21, 2013 at 11:06 AM, Romita Saha romita.s...@sg.panasonic.comwrote Hi, I use some tokenizers to tokenize the query. I want to see the tokenized query words displayed in the response.Could you kindly help me do that. Thanks and regards, Romita
Re: Tokenized keywords
Can you please elaborate a more on what you are trying to achieve. Tokenizers work on indexed field and doesn't effect how the values will be displayed. The response value comes from stored field. If you want to see how your query is being tokenized, you can do it using analysis interface or enable debugQuery to see how your query is being formed. On Mon, Jan 21, 2013 at 11:06 AM, Romita Saha romita.s...@sg.panasonic.comwrote Hi, I use some tokenizers to tokenize the query. I want to see the tokenized query words displayed in the response.Could you kindly help me do that. Thanks and regards, Romita
Re: MultiValue
you just need to make the field as multivalued. field name=last_name type=string indexed=true stored=true * */ field name=trainingskill type=string indexed=true stored=true *multiValued=true */ type should be set based on your search requirements. On Thu, Jan 17, 2013 at 11:27 PM, anurag.jain anurag.k...@gmail.com wrote: my json file look like [ { last_name : jain, training_skill:[c, c++, php,java,.net] }] can u please suggest me how should i declare field in schema for trainingskill field please reply urgent -- View this message in context: http://lucene.472066.n3.nabble.com/MultiValue-tp4034305.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MultiValue
You mean to say that the problem is with json which is being ingested. What you are trying to achieve is that you want to split the values on the basis of comma and index it as multiple value. What problem you are facing in indexing json in format Solr expects. If you don't have control over it, probably you can try playing with custom processors. On Fri, Jan 18, 2013 at 12:31 AM, anurag.jain anurag.k...@gmail.com wrote: [ { last_name : jain, training_skill:[c, c++, php,java,.net] } ] actually i want to tokenize in c c++ php java .net so through this i can make them as facet. but problem is in list training_skill:[c, c++, *php,java,.net*] -- View this message in context: http://lucene.472066.n3.nabble.com/MultiValue-tp4034305p4034316.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.6.2 or 4.0
As someone in the forum correctly said, if all Solr releases were evolutionary Solr 4.0 is revolutionary. It has lots of improvement over the previous releases like NoSql features, atomic updates, cloud features and lot more. Solr 4.0 would be the right migration I believe. Can someone in the forum provide a reason to migrate to 3.6.2 and not 4.0 On Fri, Jan 4, 2013 at 5:16 PM, vijeshnair vijeshkn...@gmail.com wrote: We are starting a new e-com application from this month onwards, for which I am trying to identify the right SOLR release. We were using 3.4 in our previous project, bu I have read in multiple blogs and forums about the improvements that SOLR 4 has in terms of efficient memory management, less OOMs etc. So my question would be, can I start using SOLR 4 for my new project ? Why is it that Apache keeping both 3.6.2 and 4.0 releases in the downloads? Are there any major changes in 4.0 comparing to 3.x, so that I should study those changes before getting in to 4.0 ? Please help, so that I can propose 4.0 to my team. Thanks Vijesh Nair -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-6-2-or-4-0-tp4030527.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr atomic update of multi-valued field
Hi Erick, The name field is stored. I experience problem only when I update multiValued field with multiple values like, * field name=skills update=setsolr/field* * field name=skills update=setlucene/field* * * It works perfect, when I set single value for multiValued field like, *field name=skills update=setsolr/field* Thanks, Dikchant On Wed, Dec 19, 2012 at 6:25 PM, Erick Erickson erickerick...@gmail.comwrote: FIrst question: Is the name field stored (stored=true)? If it isn't, that would explain your problems with that field. _all_ relevant fields (i.e. everything not a destination of a copyField) need to be stored for atomic updates to work. Your second problem I'm not sure about. I remember some JIRAs about multivalued fields and atomic updates, you might get some info from the JIRAs here: https://issues.apache.org/jira/browse/SOLR but updating multiValued fields _should_ work... Best Erick On Tue, Dec 18, 2012 at 2:20 AM, Dikchant Sahi contacts...@gmail.com wrote: Hi, Does Solr 4.0 allows to update the values of multi-valued field? Say I have list of values for skills field like java, j2ee and i want to change it to solr, lucene. I was trying to play with atomic updates and below is my observation: I have following document in my index: doc str name=id1/str str name=nameDikchant/str str name=professionsoftware engineer/str arr name=skills strjava/str strj2ee/str /arr /doc To update the skills to solr, lucene, I indexed document as follows: *add* * doc* *field name=id1/field* *field name=skills update=setsolr/field* *field name=skills update=setlucene/field* * /doc* */add* The document added to index is as follows: *doc* * str name=id1/str* * arr name=skills* *str{set=solr}/str* *str{set=lucene}/str* * /arr* */doc* This is not what I was looking for. I found 2 issues: 1. The value of name field was lost 2. The skills fields had some junks like *{set=solr}* * * * * Then, to achieve my goal, I tried something different. I tried setting some single valued field with update=set parameter to the same value and also provided the values of multi-valued field as we do while adding new document. add doc field name=id1/field *field name=name update=setDikchant/field* field name=skillssolr/field field name=skillslucene/field /doc /add With this the index looks as follows: doc str name=id1/str str name=nameDikchant/str str name=professionsoftware engineer/str arr name=skills strsolr/str strlucene/str /arr /doc The values of multivalued field is changed and value of other field is not deleted. The question that comes to my mind is, does Solr 4.0 allows update of multi-valued field? if yes, is this how it works or am I doing something wrong? Regards, Dikchant
Re: Update / replication of offline indexes
Thanks Erick and Upayavira! This answers my question. On Mon, Dec 17, 2012 at 8:05 AM, Erick Erickson erickerick...@gmail.comwrote: See the very last line here: http://wiki.apache.org/solr/MergingSolrIndexes Short answer is that merging will lead to duplicate documents, even with uniqueKeys defined. So you're really kind of stuck handling this outside of merge, either by shipping the list of overwritten docs and deleting them from the base index or shipping the JSON/XML format and indexing those. Of the two, I'd think the latter is easiest/least prone to surprises. Especially since you could re-run the indexing as many times as necessary. The UniqueKey bits are only guaranteed to overwrite older docs when indexing, not merging. Best Erick On Thu, Dec 13, 2012 at 3:17 PM, Dikchant Sahi contacts...@gmail.com wrote: Hi Alex, You got my point right. What I see is merge adds duplicate document. Is there a way to overwrite existing document in one core by another. Can merge operation lead to data corruption, say in case when the core on client had uncommitted changes. What would be a better solution for my requirement, merge or indexing XML/JSON? Regards, Dikchant On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Not sure I fully understood this and maybe you already cover that by 'merge', but if you know what you gave the client last time, you can just build a differential as a second core, then on client mount that second core and merge it into the first one (e.g. with DIH). Just a thought. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com wrote: Hi Erick, Sorry for creating the confusion. By slave, I mean the indexes on client machine will be replica of the master and in not same as the slave in master-slave model. Below is the detail: The system is being developed to support search facility on 1000s of system, a majority of which will be offline. The idea is that we will have a search system which will be sold on subscription basis. For each of the subscriber, we will copy the master index to their local machine, over a drive or CD. Now, if a subscriber comes after 2 months and want the updates, we just want to provide the deltas for 2 month as the volume of data is huge. For this we can think of two approaches: 1. Fetch the documents which are less than 2 months old in JSON format from master Solr. Copy it to the subscriber machine and index those documents. (copy through cd / memory sticks) 2. Create separate indexes for each month on our master machine. Copy the indexes to the client machine and merge. Prior to merge we need to delete records which the new index has, to avoid duplicates. As long as the setup is new, we will copy the complete index and restart Solr. We are not sure of the best approach for copying the deltas. Thanks, Dikchant On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.com wrote: This is somewhat confusing. You say that box2 is the slave, yet they're not connected? Then you need to copy the solr home/data index from box 1 to box 2 manually (I'd have box2 solr shut down at the time) and restart Solr. Why can't the boxes be connected? That's a much simpler way of going about it. Best Erick On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com wrote: Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.org wrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can
Solr atomic update of multi-valued field
Hi, Does Solr 4.0 allows to update the values of multi-valued field? Say I have list of values for skills field like java, j2ee and i want to change it to solr, lucene. I was trying to play with atomic updates and below is my observation: I have following document in my index: doc str name=id1/str str name=nameDikchant/str str name=professionsoftware engineer/str arr name=skills strjava/str strj2ee/str /arr /doc To update the skills to solr, lucene, I indexed document as follows: *add* * doc* *field name=id1/field* *field name=skills update=setsolr/field* *field name=skills update=setlucene/field* * /doc* */add* The document added to index is as follows: *doc* * str name=id1/str* * arr name=skills* *str{set=solr}/str* *str{set=lucene}/str* * /arr* */doc* This is not what I was looking for. I found 2 issues: 1. The value of name field was lost 2. The skills fields had some junks like *{set=solr}* * * * * Then, to achieve my goal, I tried something different. I tried setting some single valued field with update=set parameter to the same value and also provided the values of multi-valued field as we do while adding new document. add doc field name=id1/field *field name=name update=setDikchant/field* field name=skillssolr/field field name=skillslucene/field /doc /add With this the index looks as follows: doc str name=id1/str str name=nameDikchant/str str name=professionsoftware engineer/str arr name=skills strsolr/str strlucene/str /arr /doc The values of multivalued field is changed and value of other field is not deleted. The question that comes to my mind is, does Solr 4.0 allows update of multi-valued field? if yes, is this how it works or am I doing something wrong? Regards, Dikchant
Re: Update / replication of offline indexes
Hi Alex, You got my point right. What I see is merge adds duplicate document. Is there a way to overwrite existing document in one core by another. Can merge operation lead to data corruption, say in case when the core on client had uncommitted changes. What would be a better solution for my requirement, merge or indexing XML/JSON? Regards, Dikchant On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Not sure I fully understood this and maybe you already cover that by 'merge', but if you know what you gave the client last time, you can just build a differential as a second core, then on client mount that second core and merge it into the first one (e.g. with DIH). Just a thought. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com wrote: Hi Erick, Sorry for creating the confusion. By slave, I mean the indexes on client machine will be replica of the master and in not same as the slave in master-slave model. Below is the detail: The system is being developed to support search facility on 1000s of system, a majority of which will be offline. The idea is that we will have a search system which will be sold on subscription basis. For each of the subscriber, we will copy the master index to their local machine, over a drive or CD. Now, if a subscriber comes after 2 months and want the updates, we just want to provide the deltas for 2 month as the volume of data is huge. For this we can think of two approaches: 1. Fetch the documents which are less than 2 months old in JSON format from master Solr. Copy it to the subscriber machine and index those documents. (copy through cd / memory sticks) 2. Create separate indexes for each month on our master machine. Copy the indexes to the client machine and merge. Prior to merge we need to delete records which the new index has, to avoid duplicates. As long as the setup is new, we will copy the complete index and restart Solr. We are not sure of the best approach for copying the deltas. Thanks, Dikchant On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.com wrote: This is somewhat confusing. You say that box2 is the slave, yet they're not connected? Then you need to copy the solr home/data index from box 1 to box 2 manually (I'd have box2 solr shut down at the time) and restart Solr. Why can't the boxes be connected? That's a much simpler way of going about it. Best Erick On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com wrote: Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.org wrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach for getting the delta to the slave. I can think of two approaches. 1.Create separate indexes of the delta on the master machine, copy it to the slave machine and merge. Before merging the indexes on the client machine, delete all the updated and deleted documents in client machine else merge will add duplicates. So along with the index, we need to transfer the list of documents which has been updated/deleted. 2. Extract all the documents which has changed since a particular time in XML/JSON and index it in client machine. The size of indexes are huge, so we cannot rollover index everytime. Please help me with your take and challenges you see in the above approaches. Please suggest if you think of any other better
Re: Update / replication of offline indexes
Yes, we have an uniqueId defined but merge adds two documents with the same id. As per my understanding this is how Solr behaves. Correct me if am wrong. On Fri, Dec 14, 2012 at 2:25 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Do you have IDs defined? How do you expect Sold to know they are duplicate records? Maybe the issue is there somewhere. Regards, Alex On 13 Dec 2012 15:17, Dikchant Sahi contacts...@gmail.com wrote: Hi Alex, You got my point right. What I see is merge adds duplicate document. Is there a way to overwrite existing document in one core by another. Can merge operation lead to data corruption, say in case when the core on client had uncommitted changes. What would be a better solution for my requirement, merge or indexing XML/JSON? Regards, Dikchant On Thu, Dec 13, 2012 at 6:39 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Not sure I fully understood this and maybe you already cover that by 'merge', but if you know what you gave the client last time, you can just build a differential as a second core, then on client mount that second core and merge it into the first one (e.g. with DIH). Just a thought. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Dec 13, 2012 at 5:28 PM, Dikchant Sahi contacts...@gmail.com wrote: Hi Erick, Sorry for creating the confusion. By slave, I mean the indexes on client machine will be replica of the master and in not same as the slave in master-slave model. Below is the detail: The system is being developed to support search facility on 1000s of system, a majority of which will be offline. The idea is that we will have a search system which will be sold on subscription basis. For each of the subscriber, we will copy the master index to their local machine, over a drive or CD. Now, if a subscriber comes after 2 months and want the updates, we just want to provide the deltas for 2 month as the volume of data is huge. For this we can think of two approaches: 1. Fetch the documents which are less than 2 months old in JSON format from master Solr. Copy it to the subscriber machine and index those documents. (copy through cd / memory sticks) 2. Create separate indexes for each month on our master machine. Copy the indexes to the client machine and merge. Prior to merge we need to delete records which the new index has, to avoid duplicates. As long as the setup is new, we will copy the complete index and restart Solr. We are not sure of the best approach for copying the deltas. Thanks, Dikchant On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.com wrote: This is somewhat confusing. You say that box2 is the slave, yet they're not connected? Then you need to copy the solr home/data index from box 1 to box 2 manually (I'd have box2 solr shut down at the time) and restart Solr. Why can't the boxes be connected? That's a much simpler way of going about it. Best Erick On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com wrote: Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.org wrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach for getting the delta to the slave. I can think of two approaches
Re: Update / replication of offline indexes
Hi Erick, Sorry for creating the confusion. By slave, I mean the indexes on client machine will be replica of the master and in not same as the slave in master-slave model. Below is the detail: The system is being developed to support search facility on 1000s of system, a majority of which will be offline. The idea is that we will have a search system which will be sold on subscription basis. For each of the subscriber, we will copy the master index to their local machine, over a drive or CD. Now, if a subscriber comes after 2 months and want the updates, we just want to provide the deltas for 2 month as the volume of data is huge. For this we can think of two approaches: 1. Fetch the documents which are less than 2 months old in JSON format from master Solr. Copy it to the subscriber machine and index those documents. (copy through cd / memory sticks) 2. Create separate indexes for each month on our master machine. Copy the indexes to the client machine and merge. Prior to merge we need to delete records which the new index has, to avoid duplicates. As long as the setup is new, we will copy the complete index and restart Solr. We are not sure of the best approach for copying the deltas. Thanks, Dikchant On Thu, Dec 13, 2012 at 3:52 AM, Erick Erickson erickerick...@gmail.comwrote: This is somewhat confusing. You say that box2 is the slave, yet they're not connected? Then you need to copy the solr home/data index from box 1 to box 2 manually (I'd have box2 solr shut down at the time) and restart Solr. Why can't the boxes be connected? That's a much simpler way of going about it. Best Erick On Tue, Dec 11, 2012 at 1:04 AM, Dikchant Sahi contacts...@gmail.com wrote: Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.org wrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach for getting the delta to the slave. I can think of two approaches. 1.Create separate indexes of the delta on the master machine, copy it to the slave machine and merge. Before merging the indexes on the client machine, delete all the updated and deleted documents in client machine else merge will add duplicates. So along with the index, we need to transfer the list of documents which has been updated/deleted. 2. Extract all the documents which has changed since a particular time in XML/JSON and index it in client machine. The size of indexes are huge, so we cannot rollover index everytime. Please help me with your take and challenges you see in the above approaches. Please suggest if you think of any other better approach. Thanks a ton! Regards, Dikchant -- Walter Underwood wun...@wunderwood.org
Re: Update multiple documents
My intention is to allow search on person names in the second index also. If we use personId in the second index, is there a way to achieve that? Yes, we are looking for join kind of feature. Thanks! On Wed, Dec 12, 2012 at 8:31 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: But is that the best approach? If you use personIds in your second index then you don't have to did that. Maybe you are after joins in Solr? Otis -- SOLR Performance Monitoring - http://sematext.com/spm On Dec 11, 2012 1:21 PM, Dikchant Sahi contacts...@gmail.com wrote: Hi, We have two set of related indexes. Index1: person(personId, person_name, field2, field3) Index2: mapping (id, fieldx, fieldy, person) When ever any person name changes, we need to update both the indexes. For person field, we can update the person name as we have personId which is the uniqueKey. How can we update the person names in index2. Eg: Index1: person(001, Micheal Jackson, value1, value2) Index2: mapping(1234, Thriller, Micheal Jackson) (1235, Billy Jean, Micheal Jackson) *Micheal* Jackson changes to *Michael* Jackson What would be the best approach solution to this problem. Thanks, Dikchant
Re: Update / replication of offline indexes
Hi Walter, Thanks for the response. Commit will help to reflect changes on Box1. We are able to achieve this. We want the changes to reflect in Box2. We have two indexes. Say Box1: Master DB has been setup. Data Import runs on this. Box2: Slave running. We want all the updates on Box1 to be merged/present in index on Box2. Both the boxes are not connected over n/w. How can be achieve this. Please let me know, if am not clear. Thanks again! Regards, Dikchant On Tue, Dec 11, 2012 at 11:22 AM, Walter Underwood wun...@wunderwood.orgwrote: You do not need to manage online and offline indexes. Commit when you are done with your updates and Solr will take care of it for you. The changes are not live until you commit. wunder On Dec 10, 2012, at 9:46 PM, Dikchant Sahi wrote: Hi, How can we do delta update of offline indexes? We have the master index on which data import will be done. The index directory will be copied to slave machine in case of full update, through CD as the slave/client machine is offline. So, what should be the approach for getting the delta to the slave. I can think of two approaches. 1.Create separate indexes of the delta on the master machine, copy it to the slave machine and merge. Before merging the indexes on the client machine, delete all the updated and deleted documents in client machine else merge will add duplicates. So along with the index, we need to transfer the list of documents which has been updated/deleted. 2. Extract all the documents which has changed since a particular time in XML/JSON and index it in client machine. The size of indexes are huge, so we cannot rollover index everytime. Please help me with your take and challenges you see in the above approaches. Please suggest if you think of any other better approach. Thanks a ton! Regards, Dikchant -- Walter Underwood wun...@wunderwood.org
Re: multiple indexes?
Multiple indexes can be setup using the multi core feature of Solr. Below are the steps: 1. Add the core name and storage location of the core to the $SOLR_HOME/solr.xml file. cores adminPath=/admin/cores defaultCoreName=core-name1 *core name=core-name1 instanceDir=core-dir1 /* *core name=core-name2 instanceDir=core-dir2 /* /cores 2. Create the core-directories specified and following sub-directories in it: - conf: Contains the configs and schema definition - lib: Contains the required libraries - data: Will be created automatically on first run. This would contain the actual index. While indexing the docs, you specify the core name in the url as follows: http://host:port/solr/core-name/update?parameters Similarly you do while querying. Please refer to Solr Wiki, it has the complete details. Hope this helps! - Dikchant On Sat, Dec 1, 2012 at 10:41 AM, Joe Zhang smartag...@gmail.com wrote: May I ask: how to set up multiple indexes, and specify which index to send the docs to at indexing time, and later on, how to specify which index to work with? A related question: what is the storage location and structure of solr indexes? Thanks in advance, guys! Joe.
Re: solr issue with seaching words
Try debugging it using analysis page or running the query in debug mode (debugQuery=true). In analysis page, add 'RCA-Jack/' to index and 'jacke' to query. This might help you understanding the behavior. If still unable to debug, some additional information would be required to help. On Tue, Sep 4, 2012 at 3:38 PM, zainu zainu...@gmail.com wrote: I am facing a strange problem. I am searching for word jacke but solr also returns result where my description contains 'RCA-Jack/'. Íf i search jacka or jackc or jackd, it works fine and does not return me any result which is what i am expecting in this case. Only when there is jacke, it return me result with RCA-Jack/. So there seems some kind of relationshio between e and / and it considers e as /. Any help? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-issue-with-seaching-words-tp4005200.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search results not returned for a str field
DefaultSearchField is the field which is queried if you don't explicitly specify the fields to query on. Please refer to the below link: http://wiki.apache.org/solr/SchemaXml On Sat, Jul 21, 2012 at 12:56 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hello, Lakshmi, The issue is the fieldType you've assigned to the fields in your schema does not perform any analysis on the string before indexing it. So it will only do exact matches. If you want to do matches against portions of the field value, use one of the text types that come in the default schema. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Fri, Jul 20, 2012 at 3:18 PM, Lakshmi Bhargavi lakshmi.bharg...@gmail.com wrote: Hi , I have the following configuration ?xml version=1.0 ? schema name=example core zero version=1.1 types fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true/ /types fields field name=id type=string indexed=true stored=true multiValued=false required=true/ field name=typetype=string indexed=true stored=true multiValued=false / field name=nametype=string indexed=true stored=true multiValued=false / field name=core0 type=string indexed=true stored=true multiValued=false / /fields uniqueKeyid/uniqueKey defaultSearchFieldname/defaultSearchField solrQueryParser defaultOperator=OR/ /schema I am also attaching the solr config file http://lucene.472066.n3.nabble.com/file/n3996313/solrconfig.xml solrconfig.xml I indexed a document adddoc field name=idMA147LL/A/field field name=nameApple 60 GB iPod with Video Playback Black/field /doc/add When I do a wildcard search , the results are returned http://localhost:8983/solr/select?q=*:* ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime1/int /lst - result name=response numFound=1 start=0 - doc str name=idMA147LL/A/str str name=nameApple 60 GB iPod with Video Playback Black/str /doc /result /response but the results are not returned for specific query http://localhost:8983/solr/core0/select?q=iPod ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime5/int /lst result name=response numFound=0 start=0 / /response Could some one please let me know what is wrong? Also would be very helpful if some one can explain the significance of the defaultsearch field. Thanks, lakshmi -- View this message in context: http://lucene.472066.n3.nabble.com/Search-results-not-returned-for-a-str-field-tp3996313.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: NGram for misspelt words
You are creating grams only while indexing and not querying hence 'ludlwo' would not match. Your analyzer will create the following grams while indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match to 'ludlwo'. Either you need to create gram while querying also or use Edit Distance. On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar yhus...@firstam.com wrote: I have configured NGram Indexing for some fields. Say I search for the city Ludlow, I get the results (normal search) If I search for Ludlo (with w ommitted) I get the results If I search for Ludl (with ow ommitted) I still get the results I know that they are all partial strings of the main string hence NGram works perfect. But when I type in Ludlwo (misspelt, characters o and w interchanged) I dont get any results, It should ideally match Ludl and provide the results. I am not looking for Edit distance based Spell Correctors. How can I make above NGram based search work? Here is my schema.xml (NGramFieldType): fieldType name=nGram class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType /PRE BR **BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR **BR FAFLDBR PRE
Re: NGram for misspelt words
Have you tried the analysis window to debug. I believe you are doing something wrong in the fieldType. On Wed, Jul 18, 2012 at 8:07 PM, Husain, Yavar yhus...@firstam.com wrote: Thanks Sahi. I have replaced my EdgeNGramFilterFactory to NGramFilterFactory as I need substrings not just in front or back but anywhere. You are right I put the same NGramFilterFactory in both Query and Index however now it does not return any results not even the basic one. -Original Message- From: Dikchant Sahi [mailto:contacts...@gmail.com] Sent: Wednesday, July 18, 2012 7:54 PM To: solr-user@lucene.apache.org Subject: Re: NGram for misspelt words You are creating grams only while indexing and not querying hence 'ludlwo' would not match. Your analyzer will create the following grams while indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match to 'ludlwo'. Either you need to create gram while querying also or use Edit Distance. On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar yhus...@firstam.com wrote: I have configured NGram Indexing for some fields. Say I search for the city Ludlow, I get the results (normal search) If I search for Ludlo (with w ommitted) I get the results If I search for Ludl (with ow ommitted) I still get the results I know that they are all partial strings of the main string hence NGram works perfect. But when I type in Ludlwo (misspelt, characters o and w interchanged) I dont get any results, It should ideally match Ludl and provide the results. I am not looking for Edit distance based Spell Correctors. How can I make above NGram based search work? Here is my schema.xml (NGramFieldType): fieldType name=nGram class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ !-- potentially word delimiter, synonym filter, stop words, NOT stemming -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType /PRE BR ** BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR ** BR FAFLDBR PRE
Re: Big Data Analysis and Management - 2 day Workshop
Hi Manish, The attachment seems to be missing. Would you mind sharing the same. Am a Search Engineer based in Bangalore. Would me interested in attending the workshop. Best Regards, Dikchant Sahi On Thu, May 24, 2012 at 10:22 AM, Manish Bafna manish.bafna...@gmail.comwrote: Dear Friend, We are organizing a workshop on Big Data. Here are details regarding the same. Please forward it to your company HR and also your friends and let me know if anyone is interested. We have early bird offer if registration is done before 31st May 2012. Big Data is one space that is buzzing in the market big time. There are several applications of various technologies involved around Big Data. Many a times when we work as part of various project or product development we all will be streamlining our time and energy towards its successful delivery. To ensure your colleagues don't miss out on this hot topic and to stay abreast with these niche things, we thought we will share our expertise with Senior Developers and Architects through this workshop on *Big Data Analysis and Management* that we have scheduled on *Bangalore on June 16th and 17th.* ** We will be covering various topics under the following 4 broad headlines. You can check the attached outline for a detailed insight into what we will cover under each head. It is definitely going to be an intensive and relevant hands-on session along with vivid explanation of concepts and theories around it. On a lighter note, there is definitely going to be lot of jargons flowing around all participants in this short span of two days. * * *Content Extraction (hands-on using Apache Tika)* *Distribute Content in NOSQL ways (hands-on using Cassandra, Neo4j)* *Search and Indexing (hands-on using Solr and Tika)* *Distributed computing and analysis using Hadoop MapReduce and Mahout (hands-on using Hadoop MapReduce, Mahout)* To register for this workshop, kindly send a mail to me along with the details of the participants (along with their profile will be better) and payment details. I am enclosing herewith the complete course details attached along with this mail. I along with two of my peers will be delivering this workshop. You can find our brief profile mentioned in the attached content. Feel free to contact me any time for any queries With best regards, Manish.