All,

I am ingesting a lot of RSS feeds as part of my application and I keep
getting the same error.

WARNING: Could not parse a Date field
java.text.ParseException: Unparseable date: "Mon, 06 Dec 2010 23:31:38
+0000"
        at java.text.DateFormat.parse(Unknown Source)
        at
org.apache.solr.handler.dataimport.DateFormatTransformer.process(Date
FormatTransformer.java:89)
        at
org.apache.solr.handler.dataimport.DateFormatTransformer.transformRow
(DateFormatTransformer.java:69)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.applyTransf
ormer(EntityProcessorWrapper.java:195)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:241)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:357)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:383)
        at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
ava:242)
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:180)
        at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
rter.java:331)
        at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
ava:389)
        at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
va:370)
Dec 11, 2010 6:25:47 PM org.apache.solr.handler.dataimport.DocBuilder finish
INFO: Import completed successfully
Dec 11, 2010 6:25:47 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDelete
s=false)

Are there any tips or tricks to getting standard RSS <update> fields to
import correctly?

An example for a DIH config XML file is as follows:

      <entity name="CBS"
        pk="link"
        datasource="filedatasource"
        url="http://feeds.cbsnews.com/CBSNewsMain?format=xml";
        processor="XPathEntityProcessor"
        forEach="/rss/channel | /rss/channel/item"
        transformer="DateFormatTransformer,HTMLStripTransformer">
         <field column="source"       xpath="/rss/channel/title"
commonField="true" />
        <field column="source-link"  xpath="/rss/channel/link"
 commonField="true" />
        <field column="subject"      xpath="/rss/channel/description"
commonField="true" />
        <field column="title"        xpath="/rss/channel/item/title" />
        <field column="link"         xpath="/rss/channel/item/link" />
        <field column="description"  xpath="/rss/channel/item/description"
stripHTML="true" />
        <field column="creator"      xpath="/rss/channel/item/creator" />
        <field column="item-subject" xpath="/rss/channel/item/subject" />
        <field column="author"       xpath="/rss/channel/item/author" />
        <field column="comments"     xpath="/rss/channel/item/comments" />
        <field column="pubdate"      xpath="/rss/channel/item/pubDate"
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />
      </entity>

Any tips on this would be really appreciated as I need to query based on the
date the article was published.

Thanks,
Adam

Reply via email to