Hi

I'm using Nutch 2.2.1 with HBase

How can I restrict the fields persisted in HBase? For example, I don't need
the "p:c" column (parser text field). Its actual content will never be used
by my search implementation (am not using a default text field). I can see
the "p:c" mapping is listed in conf/gora-hbase-mapping.xml but omitting it
from the file causes a Gora writer exception.

I'm using my own set of plugins to extract the specific content I need and
adding it to metadata so its saved in column mtdt.

Now I want to restrict the storage of additional data to the most minimum
required for Nutch to function (mostly to minimise hard disk usage). For
example, I don't want to store headers (column h)- how can I restrict them
from making it to HBase?

Also, I'm using "fetcher.parse" = true, so don't require data persisted for
post-parsing


Thanks

Az

Reply via email to