Re: Nutch 2.x architecture Supporting multivalues

kiran chitturi Wed, 10 Oct 2012 14:36:26 -0700

One thing i thought of is, i could use a StringBuilder and append all the
multivalues, convert it to string and save it as value in the ByteBuffer.


In this way the metadata type need not be changed. Maybe  some kind of
separator can be used to distinguish multiple values. I am not sure if this
is ideal case.

In the indexer, we can still separate values from the main string and then
we can pass it as an array to NutchDocument if we can only change that type.

Please let me know what you think of this. Seperator might not be an ideal
case.

Thank you,
Kiran





On Wed, Oct 10, 2012 at 4:46 PM, kiran chitturi
<[email protected]>wrote:

>
> Hi,
>
> I am working on porting parse-metatags plugin to Nutch 2.x series. I did
> work on patches on the same plugin for Nutch 1.5 so that multivalued tags
> are saved in an array and then sent to Solr. It all worked good in 1.5.
>
> I have ported the plugin to Nutch 2.x now but it works only for a single
> value of the tag. It does not work for multivalues of a tag.
>
> I had problem working with the Nutch architecture and the api, since some
> functions do not accept multivalues like 'add function in NutchDocument'.
> It has accepted 'object' type as second argument in 1.5 version but only
> accepts string type in 2.x versions.
>
> I have tried changing the metadata type to 'Map<utf8, List<ByteBuffer>>'
> in WebPage and all other functions which used it. It has worked but also
> failed at some points. So i am not sure if its the best way to proceed.
>
> Can someone point to me whats the best way to do this ?
>
> I want value of the metadata key to accept multivalues, so we should be
> storing it as an array type. NutchDocument.add should accept array type in
> the second parameter to pass the index values as an array.
>
> I am also interested in knowing the opinion of nutch developers regarding
> these changes.
>
> Many Thanks,
>
> --
> Kiran Chitturi
>
>
>
>
> --
> Kiran Chitturi
>
>


-- 
Kiran Chitturi

Re: Nutch 2.x architecture Supporting multivalues

Reply via email to