Ideally I would prefer a Nutch command to perform the operation. Perhaps a flag on updatedb to delete the content once it's been successfully parsed and updated.
I'll try and take a look once I have some time. On Tue, Apr 30, 2013 at 12:23 PM, Lewis John Mcgibbney < [email protected]> wrote: > I would most likely agree with Tejas. > Either that or you could use the delete and deleteByQuery operations for > > http://gora.apache.org/docs/current/apidocs-0.2.1/index.html?org/apache/gora/hbase/store/HBaseStore.html > It depends on how you intend to use the software. > hth > > > On Tue, Apr 30, 2013 at 5:22 AM, Tejas Patil <[email protected] > >wrote: > > > At top of my head, all I think of is getting to the hbase shell and > running > > some queries to remove the unwanted things from the "*crawlId_w*ebpage" > > table. I have never done this so cant vouch if it would work well. > > > > > > On Tue, Apr 30, 2013 at 5:11 AM, Bai Shen <[email protected]> > wrote: > > > > > Is there a way to remove the files that fetched files from HBase after > > > they've been parsed? I'm running things locally and don't have the > > storage > > > space to store all of the fetched files. > > > > > > Thanks. > > > > > > > > > -- > *Lewis* >

