Re: Cleaning the database content

2013-06-25 Thread H. Coskun Gunduz
Hi Benjamin, First enter to the hbase shell by $ cd path_to_your_hbase/bin $ ./hbase shell then disable and drop your table: disable 'table_name' drop 'table_name' Best regards, coskun... On 06/25/2013 03:14 PM, Sznajder ForMailingList wrote: Hi I am using HBASE Best regards Benjamin

Re: Incomplete HTML content of a crawled Page in ParseFilter ?

2013-06-17 Thread H. Coskun Gunduz
Hi Tony, You may need to add http.content.limit parameter in nutch-site.xml file. for size-unlimited crawling: property namehttp.content.limit/name *value-1/value* descriptionThe length limit for downloaded content using the file protocol, in bytes. If this value is

Re: Running Nutch standalone (without Solr)

2013-06-12 Thread H. Coskun Gunduz
Hi Peter, Yes, it's possible. You'll need a data store (my personal recommendation is HBase). Regarding on the Nutch version you use, you can follow these tutorials: Nutch 1.x: http://wiki.apache.org/nutch/NutchTutorial Nutch 2: http://wiki.apache.org/nutch/Nutch2Tutorial Happy crawling.

LanguageIdentifierPlugin, Implemented Languages

2013-06-03 Thread H. Coskun Gunduz
Hi, I'm looking for the list of Implemented Languages in Language Identifier Plugin. There's a list in wiki page [1] but the page last edited almost four years ago. I'm not sure if the list there is up-to-date or not. Any help will be appreciated. Thanks. coskun...