You could simply concatenate all alts from one row into a single field?
img_alts --> alt1 alt2 .. alt N

Or use different keys:
img_alt1 --> alt1
img_alt2 --> alt2
img_alt3 --> alt3
The indexer shall iterate over all keys and index the prefixed ones.

Or if you even want to index the image urls themselves:
img_url1 --> alt1
so for example img_http://example.com/tree.jpg --> "Tree"

Any of these possible?

On Tue, Jul 3, 2012 at 10:19 PM, <[email protected]> wrote:

> Hi,
>
> I was planning to parse img tags from a url content and put it in metadata
> filed of Webpage storage class in nutch2.0 to retrieve them later  in the
> indexing step.
> However, since there is no metadata data type variable in Parse class
> (compare with outlinks) this can not be done in nutch 2.0 (compare parse
> class with metadata type variable in nutch 1.X). One is restricted to use
> putToMetadata function of WebPage class which overwrites values, i.e.,if I
> try to put two metadata img_alt:alt1 img_alt:alt2  I get only the last
> value img_alt:alt2 in metadata field.
>
> So, my question is how img tag alt values can be indexed in nutch-2.0,
> provided that there are more than one img tag in all crawled urls?
> Do I need to parse them and store in one of the fields of webpage storage
> class or this step is not needed?
>
> Thanks.
> Alex.
>
>
>
> -----Original Message-----
> From: Lewis John Mcgibbney <[email protected]>
> To: user <[email protected]>
> Sent: Tue, Jul 3, 2012 5:08 am
> Subject: Re: parse and solrindex in nutch-2.0
>
>
> Hi,
>
> On Mon, Jul 2, 2012 at 8:21 PM,  <[email protected]> wrote:
>
> > Regarding the metadata, what would be a proper way of parsing end
> indexing
> multivalued tags in nutch-2.0 then?
> >
>
> Assuming you've taken a look into the schema, 'some' mutivalued fields
> are permitted out of the box. Are you having problems obtaining
> multiple values for some fields within the documents your trying to
> parse + index?
>
> Lewis
>
>
>

Reply via email to