Re: Adding a new field to Nutch + MongoDB datastore using plugin

2017-03-03 Thread lsroudi
hi, 
i have the same issue i follow all steps described in the top, i can see my
custom field in elasticsearch index but i cant see it in mongodb.
i use nutch 2.3.1
Your help is appreciated



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-a-new-field-to-Nutch-MongoDB-datastore-using-plugin-tp4269632p4323198.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Adding a new field to Nutch + MongoDB datastore using plugin

2016-04-13 Thread Lewis John Mcgibbney
Hi jvence,
Please see my reply below

On Wed, Apr 13, 2016 at 8:26 AM, <user-digest-h...@nutch.apache.org> wrote:

>
> From: jvence <jve...@gmail.com>
> To: user@nutch.apache.org
> Cc:
> Date: Tue, 12 Apr 2016 10:17:20 -0700 (MST)
> Subject: Adding a new field to Nutch + MongoDB datastore using plugin
> I am running Nutch 2.3.1 configured with MondoDB (using Gora) +
> Elasticsearch
> and would like to add a new field to the storage database NOT the index.
>

Cool. Please see below.


>
> I am able to add a field to the elasticsearch index using a custom plugin
> but would like to add it to the mongodb record for each website.
>
> I've added the field to the ./conf/schema.xml file and to
>

This relates to Solr only. If you have indexer-solr included in
plugin.includes then your field will be added to the Index. This has not
got anything to do with the Gora DataStore however.


> ./conf/gora-mongodb-mapping.xml - The field does appear in the index but
> not
> in the mongo record..
>

In addition to augmenting the mapping file, you need to
augment the webpage.avsc [0] as this essentially defines the data model you
wish to persist into Gora. We call this the persistent class. If you add
your data structure (in accordance with the Avro Specification [1]) then
run the following from $NUTCH_HOME then you will be good to go.

ant generate-gora-src

Any issues, please let us know.
Thanks

[0] https://github.com/apache/nutch/blob/2.x/src/gora/webpage.avsc
[1] https://avro.apache.org/docs/current/spec.html


Adding a new field to Nutch + MongoDB datastore using plugin

2016-04-12 Thread jvence
I am running Nutch 2.3.1 configured with MondoDB (using Gora) + Elasticsearch
and would like to add a new field to the storage database NOT the index.

I am able to add a field to the elasticsearch index using a custom plugin
but would like to add it to the mongodb record for each website.

I've added the field to the ./conf/schema.xml file and to
./conf/gora-mongodb-mapping.xml - The field does appear in the index but not
in the mongo record..

Here'e a snapshot of my plugin:

public class AddNewField implements IndexingFilter {
...
@Override
  public NutchDocument filter(NutchDocument doc, String url, WebPage page)
  throws IndexingException {
//adds the new field to the document
doc.add("mynewField", "HelloWorld");
return doc;
  }
}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-a-new-field-to-Nutch-MongoDB-datastore-using-plugin-tp4269632.html
Sent from the Nutch - User mailing list archive at Nabble.com.