Re: Data import
So I'm indexing RSS feeds. I'm running the data import full-import command with a cron job. It runs every 15 minutes and indexes a lot of RSS feeds from many sources. With cron job, I do a http request using curl, to the address http://localhost:port/solr/core/dataimport/?command=full-importclean=false When it runs, if the rss source has a feed that is already indexed on solr, it updates the existing source. So if the source has the same information of the destiny, it updates the information on the destiny. I want to prevent that. Is that explicit? I may try to provide some examples. Thanks On Tuesday, September 10, 2013, Chris Hostetter wrote: : When i run dataimport/?command=full-importclean=false, solr add new : documents with the information. But if the same information already : exists with the same uniquekey, it replaces the existing document with a : new one. : It does not update the document, it creates a new one. It's that possible? I'm not certain that i'm understanding your question. It is possible using Atomic Updates, but you have to be explicit about what/how you wnat Solr to use the new information (ie: when to replace, when to add to a multivaluded field, when to increment a numeric field, etc...) https://wiki.apache.org/solr/Atomic_Updates I don't think DIH has any straight forward syntax for letting you configure this easily, but as long as you put a map in each field (ie: via ScriptTransformer perhaps) containing a single modifier = value pair you want applied to that field, it should work. : I'm indexing rss feeds. I run the rss example that exists in the solr : examples, and i does that. Can you please be more specific about what you would like to see happen, we can better understand what your actual goal is? It's really not clear if using Atomic Updates is the easiest way to achieve what you're after, or if I'm just completley missunderstanding your question... https://wiki.apache.org/solr/UsingMailingLists -Hoss -- Sent from Gmail Mobile
Re: Data import
But with atomic updates i need to send the information, right? I want that solr automatic indexes it. And he is doing that. Can you look at the solr example in the source? There is an example on example-DIH folder. Imagine that you run the URL to import the data every 15 minutes. If the same information is already indexed, solr will update it, and by update I mean delete and index again. I just want that solr simple discards the information if this already exists with indexed. On Tuesday, September 10, 2013, Chris Hostetter wrote: : With cron job, I do a http request using curl, to the address : http://localhost:port /solr/core/dataimport/?command=full-importclean=false : : When it runs, if the rss source has a feed that is already indexed on solr, : it updates the existing source. : So if the source has the same information of the destiny, it updates the : information on the destiny. : : I want to prevent that. Is that explicit? I may try to provide some : examples. Yes, specific examples would be helpful -- it's not really clear what it is that you want to prevent. Please note the URL i mentioned before and use it as a guideline for how much detail we need to understand what it is you are asking... : Can you please be more specific about what you would like to see happen, : we can better understand what your actual goal is? It's really not clear : https://wiki.apache.org/solr/UsingMailingLists -Hoss -- Sent from Gmail Mobile
Solr documents update on index
Hi, I'm having a problem when solr indexes. It is updating documents already indexed. Is this a normal behavior? If a document with the same key already exists is it supposed to be updated? I has thinking that is supposed to just update if the information on the rss has changed. Appreciate your help -- Sent from Gmail Mobile
Re: SOLR Prevent solr of modifying fields when update doc
Hi, right now I'm using the link field that comes in any rss entry as my uniqueKey. That was the best solution that I found because in many updated documents, this was the only field that never changes. Now I'm facing another problem. When I want to search for a document with that id or link, because that is my uniqueKey, I'm not able to get an unique result. I can't successfully search for a field that is a URL on solr. I think that is because I'm encoding the URL that I'm searching for, but solr doesn't decodes it. Thanks for the concern and help On Saturday, August 24, 2013, Erick Erickson wrote: bq: but the uniqueId is generated by me. But when solr indexes and there is an update in a doc, it deletes the doc and creates a new one, so it generates a new UUID. right, this is why I was saying that a UUID field may not fit your use case. The _point_ of a UUID field is to generate a unique entry for every added document, there's no concept of only generate the UUID once per uniqueKey indexed which seems to be what you want. So I'd do something like just use the uniqueKey field rather than a separate UUID field. That doesn't change by definition. What advantage do you think you get from the UUID field over just using your uniqueKey field? Best, Erick On Sat, Aug 24, 2013 at 6:26 AM, Luis Portela Afonso meligalet...@gmail.com javascript:; wrote: Hi, The uuid, that was been used like the id of a document, it's generated by solr using an updatechain. I just use the recommend method to generate uuid's. I think an atomic update is not suitable for me, because I want that solr indexes the feeds and not me. I don't want to send information to solr, I want that indexes it each 15 minutes, for example, and now it's doing that. Lance, I don't understand what you want to say with, software that I use to index. I just use solr. I have a configuration with two entities. One that selects my rss sources from a database and then the main entity that get information from an URL and processes it. Thank you all for the answers. Much appreciated On Saturday, August 24, 2013, Greg Preston wrote: But there is an API for sending a delta over the wire, and server side it does a read, overlay, delete, and insert. And only the fields you sent will be changed. *Might require your unchanged fields to all be stored, though. -Greg On Fri, Aug 23, 2013 at 7:08 PM, Lance Norskog goks...@gmail.comjavascript:; javascript:; wrote: Solr does not by default generate unique IDs. It uses what you give as your unique field, usually called 'id'. What software do you use to index data from your RSS feeds? Maybe that is creating a new 'id' field? There is no partial update, Solr (Lucene) always rewrites the complete document. On 08/23/2013 09:03 AM, Greg Preston wrote: Perhaps an atomic update that only changes the fields you want to change? -Greg On Fri, Aug 23, 2013 at 4:16 AM, Luís Portela Afonso meligalet...@gmail.com javascript:; javascript:; wrote: Hi thanks by the answer, but the uniqueId is generated by me. But when solr indexes and there is an update in a doc, it deletes the doc and creates a new one, so it generates a new UUID. It is not suitable for me, because i want that solr just updates some fields, because the UUID is the key that i use to map it to an user in my database. Right now i'm using information that comes from the source and never chages, as my uniqueId, like for example the guid, that exists in some rss feeds, or if it doesn't exists i use link. I think there is any simple solution for me, because for what i have read, when an update to a doc exists, SOLR deletes the old one and create a new one, right? On Aug 23, 2013, at 12:07 PM, Erick Erickson erickerick...@gmail.com javascript:;javascript:; wrote: Well, not much in the way of help because you can't do what you want AFAIK. I don't think UUID is suitable for your use-case. Why not use your uniqueId? Or generate something yourself... Best Erick On Thu, Aug 22, 2013 at 5:56 PM, Luís Portela Afonso meligalet...@gmail.com javascript:; javascript:; wrote: Hi, How can i prevent solr from update some fields when updating a doc? The problem is, i have an uuid with the field name uuid, but it is not an unique key. When a rss source updates a feed, solr will update the doc with the same link but it generates a new uuid. This is not the desired because this id is used by me to relate feeds with an user. Can someone help me? Many Thanks -- Sent from Gmail Mobile -- Sent from Gmail Mobile
Re: SOLR Prevent solr of modifying fields when update doc
Hi, The uuid, that was been used like the id of a document, it's generated by solr using an updatechain. I just use the recommend method to generate uuid's. I think an atomic update is not suitable for me, because I want that solr indexes the feeds and not me. I don't want to send information to solr, I want that indexes it each 15 minutes, for example, and now it's doing that. Lance, I don't understand what you want to say with, software that I use to index. I just use solr. I have a configuration with two entities. One that selects my rss sources from a database and then the main entity that get information from an URL and processes it. Thank you all for the answers. Much appreciated On Saturday, August 24, 2013, Greg Preston wrote: But there is an API for sending a delta over the wire, and server side it does a read, overlay, delete, and insert. And only the fields you sent will be changed. *Might require your unchanged fields to all be stored, though. -Greg On Fri, Aug 23, 2013 at 7:08 PM, Lance Norskog goks...@gmail.comjavascript:; wrote: Solr does not by default generate unique IDs. It uses what you give as your unique field, usually called 'id'. What software do you use to index data from your RSS feeds? Maybe that is creating a new 'id' field? There is no partial update, Solr (Lucene) always rewrites the complete document. On 08/23/2013 09:03 AM, Greg Preston wrote: Perhaps an atomic update that only changes the fields you want to change? -Greg On Fri, Aug 23, 2013 at 4:16 AM, Luís Portela Afonso meligalet...@gmail.com javascript:; wrote: Hi thanks by the answer, but the uniqueId is generated by me. But when solr indexes and there is an update in a doc, it deletes the doc and creates a new one, so it generates a new UUID. It is not suitable for me, because i want that solr just updates some fields, because the UUID is the key that i use to map it to an user in my database. Right now i'm using information that comes from the source and never chages, as my uniqueId, like for example the guid, that exists in some rss feeds, or if it doesn't exists i use link. I think there is any simple solution for me, because for what i have read, when an update to a doc exists, SOLR deletes the old one and create a new one, right? On Aug 23, 2013, at 12:07 PM, Erick Erickson erickerick...@gmail.comjavascript:; wrote: Well, not much in the way of help because you can't do what you want AFAIK. I don't think UUID is suitable for your use-case. Why not use your uniqueId? Or generate something yourself... Best Erick On Thu, Aug 22, 2013 at 5:56 PM, Luís Portela Afonso meligalet...@gmail.com javascript:; wrote: Hi, How can i prevent solr from update some fields when updating a doc? The problem is, i have an uuid with the field name uuid, but it is not an unique key. When a rss source updates a feed, solr will update the doc with the same link but it generates a new uuid. This is not the desired because this id is used by me to relate feeds with an user. Can someone help me? Many Thanks -- Sent from Gmail Mobile