Created issue: https://issues.apache.org/jira/browse/NUTCH-1262
On Tuesday 31 January 2012 06:58:56 Alexander Aristov wrote: > Hi > > Of course we all understand that these two types are not the same and serve > for different purposes but since Nutch doesn't make difference between them > it would be possible and reasonable to make content-type the same. > > But there are might be some problems. Some nutch users might rely on > content-type and apply special parser for application/xhtml+xml, > considering maybe additional namespaces. > > Of course for indexing and searching it replacement would be good. > > > in fact there many other examples when content type of different types can > be treated in the smae way and what if we had a feature of grouping several > content types into single? > > Best Regards > Alexander Aristov > > On 30 January 2012 17:12, Markus Jelsma <[email protected]> wrote: > > Hi, > > > > Should we not provide an optional replace for the content type field in > > index- > > more? They are the same for end-users but end up differently in an index. > > > > Thoughts? > > Thanks -- Markus Jelsma - CTO - Openindex

