Sorry for lot of posts. I am wrong about the NutchDocument, it indeed saves the value of the key as ArrayList.
So the workaround i did to get everything working is; *Parsing* 1) Build the multivalues in to a single string using StringBuilder. Distinguish different values by using a separator 2) Save the string as ByteBuffer type and pass it as parameter *Indexing* 1) Retrieve the value of the key from metadata 2) Split the string using the separator that is used previously. 3) pass each splitted string it in to NutchDocument.add This is the current workaround i have. Last time, i got suggestions that separator might not be a good idea to save multiple values. Please let me know if you have any suggestions. Now, with my patch the metatags are detected and they are sent to Solr for indexing in Nutch 2.x series. Next week, i will work on other plugins in porting them to 2.x. I saw in the tika plugin, and in 'TikaParser.java' that a 'To Do for multivalues' is written. May be both are similar issues here. Thank you, Kiran. On Wed, Oct 10, 2012 at 5:35 PM, kiran chitturi <[email protected]>wrote: > One thing i thought of is, i could use a StringBuilder and append all the > multivalues, convert it to string and save it as value in the ByteBuffer. > > In this way the metadata type need not be changed. Maybe some kind of > separator can be used to distinguish multiple values. I am not sure if this > is ideal case. > > In the indexer, we can still separate values from the main string and then > we can pass it as an array to NutchDocument if we can only change that type. > > Please let me know what you think of this. Seperator might not be an ideal > case. > > Thank you, > Kiran > > > > > > On Wed, Oct 10, 2012 at 4:46 PM, kiran chitturi <[email protected] > > wrote: > >> >> Hi, >> >> I am working on porting parse-metatags plugin to Nutch 2.x series. I did >> work on patches on the same plugin for Nutch 1.5 so that multivalued tags >> are saved in an array and then sent to Solr. It all worked good in 1.5. >> >> I have ported the plugin to Nutch 2.x now but it works only for a single >> value of the tag. It does not work for multivalues of a tag. >> >> I had problem working with the Nutch architecture and the api, since some >> functions do not accept multivalues like 'add function in NutchDocument'. >> It has accepted 'object' type as second argument in 1.5 version but only >> accepts string type in 2.x versions. >> >> I have tried changing the metadata type to 'Map<utf8, List<ByteBuffer>>' >> in WebPage and all other functions which used it. It has worked but also >> failed at some points. So i am not sure if its the best way to proceed. >> >> Can someone point to me whats the best way to do this ? >> >> I want value of the metadata key to accept multivalues, so we should be >> storing it as an array type. NutchDocument.add should accept array type in >> the second parameter to pass the index values as an array. >> >> I am also interested in knowing the opinion of nutch developers regarding >> these changes. >> >> Many Thanks, >> >> -- >> Kiran Chitturi >> >> >> >> >> -- >> Kiran Chitturi >> >> > > > -- > Kiran Chitturi > > -- Kiran Chitturi

