Shalin Shekhar Mangar a écrit :
On Fri, Sep 25, 2009 at 6:48 PM, Brahim Abdesslam <
brahim.abdess...@maecia.com> wrote:

we are using Solr to index some RSS feeds for a news agregator application.

We've got some difficulties with the publication date of each item because
each site use an homemade date format.
The fact is that we want to have the exact amount of time between the date
of publication and the time it is now.

The fact is that the RSS example is just that, an example. It was never
meant for production use and it does not handle the variety of date formats
found in the wild. If you want to index RSS feeds, it is best to use an RSS
parser to extract out the values. You can use the PlainTextEntityProcessor
to get the raw RSS feed and write a custom transformer which uses a rss
parsing library like rome to extract the various fields.

So we decided to uses a timestamp that stores the index time for each item.

The problem is :

  * when i do a full-import&clean=false the index is always cleaned.
Thanks, we will have a look at this if we can't get the timestamp method working...
  * when i do a simple import, nothing seems to be done.
== snip ==

- Tests :

=> command=full-import&clean=false

25-Sep-2009 14:58:21 org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
25-Sep-2009 14:58:21 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=full-import}
status=0 QTime=6


See the above parameters. It has only one param: command=full-import. There
is no clean=false in there so I'm guessing the clean parameter never made it
to Solr. Can you check again?
You rock! I was working without double quotes..

on a Linux system the command :
curl http://192.168.0.14:8983/solr/dataimport?command=full-import&clean=false
just don't work like this command :
curl "http://192.168.0.14:8983/solr/dataimport?command=full-import&clean=false";

But we still have a problem with... the famous timestamp, it is always updated for each item!

To get the date and time where the item is indexed we have this field in the file schema.xml :

<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" />

Do you think the items are still all always updated ?

Thank you very mutch Shalin !

Reply via email to