And how can I know the name of the fields generated by the tika parser? there are any prefix used?
Greetings, On Oct 24, 2012, at 10:02 AM, Markus Jelsma <[email protected]> wrote: > Hi - you need a custom indexing filter that adds the fields from parsemeta to > the document. > > Cheers, > > > > -----Original message----- >> From:Eyeris Rodriguez Rueda <[email protected]> >> Sent: Wed 24-Oct-2012 14:59 >> To: [email protected] >> Subject: problems with image dynamic fields in nutch 1.4 >> >> Hi all. >> I have a problem when I try to crawl images, specifically with dynamic >> fields of that images. >> When I do a crawl, nutch is ignoring this dynamic fields. >> When I upload manually some images directly to solr index, solr's tika is >> capable to extract some metadata in dynamic fields like width, height, >> content-type, but with nutch crawl those fields are ignored. >> I have tried to put in static in solr and nutch schema but continue without >> results, here is my schema and solrindex-mapping, Im using nutch 1.4 and >> solr 3.6 . Some help or advice will be appreciated. >> ************************************************ >> Schema.xml >> >> <fields> >> <field name="segment" type="string" stored="true" indexed="false"/> >> <field name="digest" type="string" stored="true" indexed="false"/> >> <field name="boost" type="float" stored="true" indexed="false"/> >> >> >> <!-- fields for index-basic plugin --> >> <field name="host" type="url" stored="true" indexed="true"/> >> <field name="site" type="string" stored="true" indexed="true"/> >> <field name="url" type="url" stored="true" indexed="true"/> >> <field name="content" type="text" stored="true" indexed="false"/> >> <field name="title" type="text" stored="true" indexed="true"/> >> <field name="cache" type="string" stored="true" indexed="false"/> >> <field name="tstamp" type="date" stored="true" indexed="true" >> default="NOW"/> >> >> <!-- fields for index-more plugin --> >> <field name="date" type="date" stored="true" indexed="true"/> >> <field name="contentLength" type="long" stored="true" indexed="true"/> >> <field name="lastModified" type="date" stored="true" indexed="true"/> >> >> >> <!-- fields for languageidentifier plugin --> >> <field name="lang" type="string" stored="true" indexed="true"/> >> >> <!-- general --> >> <field name="id" type="string" indexed="true" stored="true" >> multiValued="false"/> >> <field name="type" type="string" indexed="true" stored="true" >> multiValued="true" /> >> <field name="name" type="string" indexed="true" stored="true" >> multiValued="false" /> >> <field name="thumbnail" type="string" stored="true" indexed="true"/> >> <field name="core0" type="string" indexed="true" stored="true" >> multiValued="false" /> >> <!-- campos estáticos --> >> <!-- para los png --> >> <field name="content_type" type="string" indexed="true" stored="true" >> multiValued="false"/> >> <field name="width" type="string" indexed="true" stored="true" >> multiValued="false"/> >> <field name="height" type="string" indexed="true" stored="true" >> multiValued="false"/> >> <field name="stream_name" type="string" indexed="true" stored="true" >> multiValued="false"/> >> <field name="stream_size" type="string" indexed="true" stored="true" >> multiValued="false"/> >> >> >> <dynamicField name="attr_*" type="binary"/> >> </fields> >> >> <!-- field to use to determine and enforce document uniqueness. --> >> <uniqueKey>id</uniqueKey> >> >> <!-- field for the QueryParser to use when an explicit fieldname is absent >> --> >> <defaultSearchField>name</defaultSearchField> >> >> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" --> >> <solrQueryParser defaultOperator="OR"/> >> </schema> >> >> *********************************************************************** >> Solrindex-mapping >> >> <fields> >> <field dest="id" source="url"/> >> <field dest="host" source="host"/> >> <field dest="site" source="site"/> >> <field dest="title" source="title"/> >> <field dest="tstamp" source="tstamp"/> >> >> <field dest="type" source="type"/> >> >> <field dest="date" source="date"/> >> <field dest="lang" source="lang"/> >> <field dest="contentLength" source="contentLength"/> >> <field dest="lastModified" source="lastModified"/> >> >> >> <!--Para los tipos de imágenes específicos --> >> <field dest="content_type" source="content_type"/> >> <field dest="width" source="width"/> >> <field dest="height" source="height"/> >> <field dest="stream_name" source="stream_name"/> >> <field dest="stream_size" source="stream_size"/> >> >> >> <field dest="thumbnail" source="thumbnail"/> >> </fields> >> <uniqueKey>id</uniqueKey> >> </mapping> >> >> >> >> >> _____________________________________________________________________ >> Ing. Eyeris Rodriguez Rueda >> Teléfono:837-3370 >> Universidad de las Ciencias Informáticas >> _____________________________________________________________________ >> >> >> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >> INFORMATICAS... >> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >> >> http://www.uci.cu >> http://www.facebook.com/universidad.uci >> http://www.flickr.com/photos/universidad_uci >> > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci

