Re: Data import

2013-09-09 Thread Luis Portela Afonso
So I'm indexing RSS feeds.
I'm running the data import full-import command with a cron job. It runs
every 15 minutes and indexes a lot of RSS feeds from many sources.

With cron job, I do a http request using curl, to the address
http://localhost:port/solr/core/dataimport/?command=full-importclean=false

When it runs, if the rss source has a feed that is already indexed on solr,
it updates the existing source.
So if the source has the same information of the destiny, it updates the
information on the destiny.

I want to prevent that. Is that explicit? I may try to provide some
examples.

Thanks

On Tuesday, September 10, 2013, Chris Hostetter wrote:


 : When i run dataimport/?command=full-importclean=false, solr add new
 : documents with the information. But if the same information already
 : exists with the same uniquekey, it replaces the existing document with a
 : new one.
 : It does not update the document, it creates a new one. It's that
 possible?

 I'm not certain that i'm understanding your question.

 It is possible using Atomic Updates, but you have to be explicit
 about what/how you wnat Solr to use the new information (ie: when to
 replace, when to add to a multivaluded field, when to increment a numeric
 field, etc...)

 https://wiki.apache.org/solr/Atomic_Updates

 I don't think DIH has any straight forward syntax for letting you
 configure this easily, but as long as you put a map in each
 field (ie: via ScriptTransformer perhaps) containing a single modifier =
 value pair you want applied to that field, it should work.

 : I'm indexing rss feeds. I run the rss example that exists in the solr
 : examples, and i does that.

 Can you please be more specific about what you would like to see happen,
 we can better understand what your actual goal is?  It's really not clear
 if using Atomic Updates is the easiest way to achieve what you're after,
 or if I'm just completley missunderstanding your question...

 https://wiki.apache.org/solr/UsingMailingLists

 -Hoss



-- 
Sent from Gmail Mobile


Re: Data import

2013-09-09 Thread Luis Portela Afonso
But with atomic updates i need to send the information, right?

I want that solr automatic indexes it. And he is doing that. Can you look
at the solr example in the source?
There is an example on example-DIH folder.

Imagine that you run the URL to import the data every 15 minutes. If the
same information is already indexed, solr will update it, and by update I
mean delete and index again.

I just want that solr simple discards the information if this already
exists with indexed.

On Tuesday, September 10, 2013, Chris Hostetter wrote:


 : With cron job, I do a http request using curl, to the address
 : http://localhost:port
 /solr/core/dataimport/?command=full-importclean=false
 :
 : When it runs, if the rss source has a feed that is already indexed on
 solr,
 : it updates the existing source.
 : So if the source has the same information of the destiny, it updates the
 : information on the destiny.
 :
 : I want to prevent that. Is that explicit? I may try to provide some
 : examples.

 Yes, specific examples would be helpful -- it's not really clear what it
 is that you want to prevent.

 Please note the URL i mentioned before and use it as a guideline for
 how much detail we need to understand what it is you are asking...

 :  Can you please be more specific about what you would like to see
 happen,
 :  we can better understand what your actual goal is?  It's really not
 clear

 :  https://wiki.apache.org/solr/UsingMailingLists



 -Hoss



-- 
Sent from Gmail Mobile


Solr documents update on index

2013-09-05 Thread Luis Portela Afonso
Hi,

I'm having a problem when solr indexes.
It is updating documents already indexed. Is this a normal behavior?
If a document with the same key already exists is it supposed to be updated?
I has thinking that is supposed to just update if the information on the
rss has changed.

Appreciate your help

-- 
Sent from Gmail Mobile


Re: SOLR Prevent solr of modifying fields when update doc

2013-08-25 Thread Luis Portela Afonso
Hi, right now I'm using the link field that comes in any rss entry as my
uniqueKey.
That was the best solution that I found because in many updated documents,
this was the only field that never changes.

Now I'm facing another problem. When I want to search for a document with
that id or link, because that is my uniqueKey, I'm not able to get an
unique result.
I can't successfully search for a field that is a URL on solr.
I think that is because I'm encoding the URL that I'm searching for, but
solr doesn't decodes it.

Thanks for the concern and help

On Saturday, August 24, 2013, Erick Erickson wrote:

 bq:  but the uniqueId is generated by me. But when solr indexes and there
 is an update in a doc, it deletes the doc and creates a new one, so it
 generates a new UUID.

 right, this is why I was saying that a UUID field may not fit your use
 case. The _point_ of a UUID field is to generate a unique entry for every
 added document, there's no concept of only generate the UUID once per
 uniqueKey indexed which seems to be what you want.

 So I'd do something like just use the uniqueKey field rather than a
 separate UUID field. That doesn't change by definition. What advantage do
 you think you get from the UUID field over just using your uniqueKey
 field?

 Best,
 Erick


 On Sat, Aug 24, 2013 at 6:26 AM, Luis Portela Afonso 
 meligalet...@gmail.com javascript:;
  wrote:

  Hi,
 
  The uuid, that was been used like the id of a document, it's generated by
  solr using an updatechain.
  I just use the recommend method to generate uuid's.
 
  I think an atomic update is not suitable for me, because I want that solr
  indexes the feeds and not me. I don't want to send information to solr, I
  want that indexes it each 15 minutes, for example, and now it's doing
 that.
 
  Lance, I don't understand what you want to say with, software that I use
 to
  index.
  I just use solr. I have a configuration with two entities. One that
 selects
  my rss sources from a database and then the main entity that get
  information from an URL and processes it.
 
  Thank you all for the answers.
  Much appreciated
 
  On Saturday, August 24, 2013, Greg Preston wrote:
 
   But there is an API for sending a delta over the wire, and server side
 it
   does a read, overlay, delete, and insert.  And only the fields you sent
   will be changed.
  
   *Might require your unchanged fields to all be stored, though.
  
  
   -Greg
  
  
   On Fri, Aug 23, 2013 at 7:08 PM, Lance Norskog 
   goks...@gmail.comjavascript:;
  javascript:;
   wrote:
  
Solr does not by default generate unique IDs. It uses what you give
 as
your unique field, usually called 'id'.
   
What software do you use to index data from your RSS feeds? Maybe
 that
  is
creating a new 'id' field?
   
There is no partial update, Solr (Lucene) always rewrites the
 complete
document.
   
   
On 08/23/2013 09:03 AM, Greg Preston wrote:
   
Perhaps an atomic update that only changes the fields you want to
   change?
   
-Greg
   
   
On Fri, Aug 23, 2013 at 4:16 AM, Luís Portela Afonso
meligalet...@gmail.com javascript:; javascript:; wrote:
   
Hi thanks by the answer, but the uniqueId is generated by me. But
  when
solr indexes and there is an update in a doc, it deletes the doc
 and
creates a new one, so it generates a new UUID.
It is not suitable for me, because i want that solr just updates
 some
fields, because the UUID is the key that i use to map it to an user
  in
   my
database.
   
Right now i'm using information that comes from the source and
 never
chages, as my uniqueId, like for example the guid, that exists in
  some
   rss
feeds, or if it doesn't exists i use link.
   
I think there is any simple solution for me, because for what i
 have
read, when an update to a doc exists, SOLR deletes the old one and
   create a
new one, right?
   
On Aug 23, 2013, at 12:07 PM, Erick Erickson 
  erickerick...@gmail.com javascript:;javascript:;
   
wrote:
   
 Well, not much in the way of help because you can't do what you
want AFAIK. I don't think UUID is suitable for your use-case. Why
  not
use your uniqueId?
   
Or generate something yourself...
   
Best
Erick
   
   
On Thu, Aug 22, 2013 at 5:56 PM, Luís Portela Afonso 
meligalet...@gmail.com javascript:; javascript:;
   
wrote:
Hi,
   
How can i prevent solr from update some fields when updating a
 doc?
The problem is, i have an uuid with the field name uuid, but it
 is
   not
an
unique key. When a rss source updates a feed, solr will update
 the
   doc
with
the same link but it generates a new uuid. This is not the
 desired
because
this id is used by me to relate feeds with an user.
   
Can someone help me?
   
Many Thanks
   
   
   
  
 
 
  --
  Sent from Gmail Mobile
 



-- 
Sent from Gmail Mobile


Re: SOLR Prevent solr of modifying fields when update doc

2013-08-24 Thread Luis Portela Afonso
Hi,

The uuid, that was been used like the id of a document, it's generated by
solr using an updatechain.
I just use the recommend method to generate uuid's.

I think an atomic update is not suitable for me, because I want that solr
indexes the feeds and not me. I don't want to send information to solr, I
want that indexes it each 15 minutes, for example, and now it's doing that.

Lance, I don't understand what you want to say with, software that I use to
index.
I just use solr. I have a configuration with two entities. One that selects
my rss sources from a database and then the main entity that get
information from an URL and processes it.

Thank you all for the answers.
Much appreciated

On Saturday, August 24, 2013, Greg Preston wrote:

 But there is an API for sending a delta over the wire, and server side it
 does a read, overlay, delete, and insert.  And only the fields you sent
 will be changed.

 *Might require your unchanged fields to all be stored, though.


 -Greg


 On Fri, Aug 23, 2013 at 7:08 PM, Lance Norskog 
 goks...@gmail.comjavascript:;
 wrote:

  Solr does not by default generate unique IDs. It uses what you give as
  your unique field, usually called 'id'.
 
  What software do you use to index data from your RSS feeds? Maybe that is
  creating a new 'id' field?
 
  There is no partial update, Solr (Lucene) always rewrites the complete
  document.
 
 
  On 08/23/2013 09:03 AM, Greg Preston wrote:
 
  Perhaps an atomic update that only changes the fields you want to
 change?
 
  -Greg
 
 
  On Fri, Aug 23, 2013 at 4:16 AM, Luís Portela Afonso
  meligalet...@gmail.com javascript:; wrote:
 
  Hi thanks by the answer, but the uniqueId is generated by me. But when
  solr indexes and there is an update in a doc, it deletes the doc and
  creates a new one, so it generates a new UUID.
  It is not suitable for me, because i want that solr just updates some
  fields, because the UUID is the key that i use to map it to an user in
 my
  database.
 
  Right now i'm using information that comes from the source and never
  chages, as my uniqueId, like for example the guid, that exists in some
 rss
  feeds, or if it doesn't exists i use link.
 
  I think there is any simple solution for me, because for what i have
  read, when an update to a doc exists, SOLR deletes the old one and
 create a
  new one, right?
 
  On Aug 23, 2013, at 12:07 PM, Erick Erickson 
  erickerick...@gmail.comjavascript:;
 
  wrote:
 
   Well, not much in the way of help because you can't do what you
  want AFAIK. I don't think UUID is suitable for your use-case. Why not
  use your uniqueId?
 
  Or generate something yourself...
 
  Best
  Erick
 
 
  On Thu, Aug 22, 2013 at 5:56 PM, Luís Portela Afonso 
  meligalet...@gmail.com javascript:;
 
  wrote:
  Hi,
 
  How can i prevent solr from update some fields when updating a doc?
  The problem is, i have an uuid with the field name uuid, but it is
 not
  an
  unique key. When a rss source updates a feed, solr will update the
 doc
  with
  the same link but it generates a new uuid. This is not the desired
  because
  this id is used by me to relate feeds with an user.
 
  Can someone help me?
 
  Many Thanks
 
 
 



-- 
Sent from Gmail Mobile