You could simply concatenate all alts from one row into a single field? img_alts --> alt1 alt2 .. alt N
Or use different keys: img_alt1 --> alt1 img_alt2 --> alt2 img_alt3 --> alt3 The indexer shall iterate over all keys and index the prefixed ones. Or if you even want to index the image urls themselves: img_url1 --> alt1 so for example img_http://example.com/tree.jpg --> "Tree" Any of these possible? On Tue, Jul 3, 2012 at 10:19 PM, <[email protected]> wrote: > Hi, > > I was planning to parse img tags from a url content and put it in metadata > filed of Webpage storage class in nutch2.0 to retrieve them later in the > indexing step. > However, since there is no metadata data type variable in Parse class > (compare with outlinks) this can not be done in nutch 2.0 (compare parse > class with metadata type variable in nutch 1.X). One is restricted to use > putToMetadata function of WebPage class which overwrites values, i.e.,if I > try to put two metadata img_alt:alt1 img_alt:alt2 I get only the last > value img_alt:alt2 in metadata field. > > So, my question is how img tag alt values can be indexed in nutch-2.0, > provided that there are more than one img tag in all crawled urls? > Do I need to parse them and store in one of the fields of webpage storage > class or this step is not needed? > > Thanks. > Alex. > > > > -----Original Message----- > From: Lewis John Mcgibbney <[email protected]> > To: user <[email protected]> > Sent: Tue, Jul 3, 2012 5:08 am > Subject: Re: parse and solrindex in nutch-2.0 > > > Hi, > > On Mon, Jul 2, 2012 at 8:21 PM, <[email protected]> wrote: > > > Regarding the metadata, what would be a proper way of parsing end > indexing > multivalued tags in nutch-2.0 then? > > > > Assuming you've taken a look into the schema, 'some' mutivalued fields > are permitted out of the box. Are you having problems obtaining > multiple values for some fields within the documents your trying to > parse + index? > > Lewis > > >

