date:20140503

Re: Nutch 1.7 - deleting segments

2014-05-03 Thread remi tassing

you are correct On Fri, May 2, 2014 at 7:46 PM, chethan chethan.p...@gmail.com wrote: Hi, I have a Nutch crawl with 4 segments which are fully indexed using the bin/nutch solrindexcommand. Now I'm all out of storage on the box, so can I delete the 4 segments and retain only the crawldb

Re: Nutch 1.7 - deleting segments

2014-05-03 Thread chethan

Thanks for your reply! Regards, -- Chethan Prasad On Sat, May 3, 2014 at 12:22 PM, remi tassing tassingr...@gmail.com wrote: you are correct On Fri, May 2, 2014 at 7:46 PM, chethan chethan.p...@gmail.com wrote: Hi, I have a Nutch crawl with 4 segments which are fully indexed using

Nutch 1.8 Solrindexer failing

2014-05-03 Thread BlackIce

HI, playing around with Nutch 1.8 in localmode on Solr 4.7.. When indexing larger crawls 10k and up I get: Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)

Re: Nutch 1.8 Solrindexer failing

2014-05-03 Thread remi tassing

Could you provide the complete stack trace? Probably add more debug info in. This could be due to some disk size issue... On Sat, May 3, 2014 at 8:51 PM, BlackIce blackice...@gmail.com wrote: HI, playing around with Nutch 1.8 in localmode on Solr 4.7.. When indexing larger crawls 10k and up

Re: Nutch 1.8 Solrindexer failing

2014-05-03 Thread BlackIce

Bad Request request: http://localhost:8983/solr/update?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at

Re: Nutch 1.7 - deleting segments

2014-05-03 Thread John Lafitte

What would be the case where you would want to keep the segments? I'm considering automatically deleting them after sending the data to solr On May 3, 2014 2:29 AM, chethan chethan.p...@gmail.com wrote: Thanks for your reply! Regards, -- Chethan Prasad On Sat, May 3, 2014 at 12:22 PM,

Re: Nutch 1.8 in pseudo dist error

2014-05-03 Thread Sebastian Nagel

Hi, looks like the segment is not addressed properly: hdfs://localhost:54310/user/hduser/TestCrawl/segments/crawl_generate Segments are named by a time-stamp, e.g. .../TestCrawl/segments/20140502231126/ crawl_generate is a subdir. Can you specify the exact commands to run the crawler?

Re: Nutch 1.8 in pseudo dist error

2014-05-03 Thread BlackIce

same as for Nutch 2.2.1 in pseudo bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/ 10 from within the deploy dir. However, i remember reading somewhere that the deploy execution for the 1.x series is different than the 2.x series, that some more files, asides the seed.txt had to be

Nutch + GATE on Amazon EMR

2014-05-03 Thread chethan

I have setup Nutch to crawl on Amazon EMR and I have a plugin that uses GATEhttps://gate.ac.uk/ for text processing in the Indexing filters. GATE requires certain static resources (some xmls and text files) to be loaded for it to be initialized. I tried to bundle these resources in the job jar and

Re: Nutch 1.7 - deleting segments

Re: Nutch 1.7 - deleting segments

Nutch 1.8 Solrindexer failing

Re: Nutch 1.8 Solrindexer failing

Re: Nutch 1.8 Solrindexer failing

Re: Nutch 1.7 - deleting segments

Re: Nutch 1.8 in pseudo dist error

Re: Nutch 1.8 in pseudo dist error

Nutch + GATE on Amazon EMR

9 matches

Site Navigation

Mail list logo

Footer information