[0]
https://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
On Thu, Oct 4, 2012 at 7:36 AM, Lewis John Mcgibbney
wrote:
> Hi James,
>
> On Thu, Oct 4, 2012 at 2:59 AM, wrote:
>> Lewis and Chris,
>>
>> Agree that "
Hi James,
On Thu, Oct 4, 2012 at 2:59 AM, wrote:
> Lewis and Chris,
>
> Agree that "The Index Structure" page is very useful documentation. I went
> through the fields/plugins listed in your link using Nutch 2.1 rc and most
> work. I was able to get positive results for everything except the f
subcollection does work with 2.x and the problem was the configuration on my
side (the subcollections.xml file in the conf folder).
So the list of fields in the "The Index Structure" page I can't confirm working
with Nutch 2.x yet are:
segment
primaryType
subtype
urlmeta
-Original Messag
Lewis and Chris,
Agree that "The Index Structure" page is very useful documentation. I went
through the fields/plugins listed in your link using Nutch 2.1 rc and most
work. I was able to get positive results for everything except the following
segment -- I am guessing this is not relevant to Nu
Hi Alexandre,
> I try to crawl a website with a menu generated with some javascript code.
> For exemple on this website:
> http://www.beautycenter-riebenbauer.at/
Nutch does not interpret java script but is has a link extractor for java
script based
on regular expressions, see plugin parse-js. I
Hi Matt,
I know th6ere is a pile of stuff to add to this but for the time being
(until I dive into your response in detail) please see below
On Tue, Oct 2, 2012 at 11:17 PM, Matt MacDonald wrote:
> Hi,
...
>
> 5) What value should I set for gora.buffer.read.limit? Currently it's
> set to the def
Gotcha. I wasn't sure if that was the case or not. Just wanted to make
sure y'all were aware.
On Wed, Oct 3, 2012 at 9:37 AM, Julien Nioche wrote:
> Only the Apache distribution of Hadoop version 1.0.3 is officially
> supported by Nutch. Obviously if we can get it to work on other
> distributi
Only the Apache distribution of Hadoop version 1.0.3 is officially
supported by Nutch. Obviously if we can get it to work on other
distribution then the better it is but this can't be considered a bug or a
blocker for the release
On 3 October 2012 14:10, Bai Shen wrote:
> I just tried to run it
I just tried to run it and I'm getting the following bug on CDH4.
https://issues.apache.org/jira/browse/NUTCH-1447
On Mon, Oct 1, 2012 at 8:17 AM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:
> Hi All,
>
> Anyone else for this VOTE?
>
> Sorry to be a pest!
>
> Thanks
>
> Lewis
>
> On
An answer to one of my own questions. I'd still love help with the others.
> Some questions:
> -
> 1) After 12 iterations I'm still seeing more than 4,500 documents out
> of 45,000 that are unfetched. How might I go about determining why the
> unfeteched urls are not being
Hi everyone,
I'm using Nutch 1.5.1, and I configured my parse plugins like that:
I try to crawl a website with a menu generated with some javascript code.
For exemple
11 matches
Mail list logo