Yeah as Lewis said - check your "regex-urlfilter.txt" file. By default
images are filtered out you can comment out
https://github.com/apache/nutch/blob/2.x/conf/regex-urlfilter.txt.template#L30


--
Thanks
Madhav Sharan


On Thu, Dec 3, 2015 at 9:17 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi Byzen.Ma,
>
> I would advise you to follow the tutorial at
> http://wiki.apache.org/nutch/Nutch2Tutorial
> Please see the answers inline
>
> On Thu, Dec 3, 2015 at 6:26 AM, <[email protected]> wrote:
>
> >
> >     <property>
> >         <name>storage.data.store.class</name>
> >         <value>org.apache.gora.sql.store.SqlStore</value>
> >         <description>Default class for storing data</description>
> >     </property>
> >
>
> This property is useless. SqlStore is non functional. Please consider using
> one of the other datastores documented at
> http://gora.apache.org/current/index.html#gora-modules
>
>
> >     <property>
> >     <name>generate.batch.id</name>
> >     <value>*</value>
> >     </property>
> >
>
> i have no idea what the above property is...
>
>
> > If you have any idea about this,would very appreciate to share with me!
> > Thanks again, Madhav.
> >
> >
> Nutch 2.X, as with Nutch 1.X will crawl images if you comment out the
> folllowing lines in your regex-urlfilter.txt
>
> https://github.com/apache/nutch/blob/2.x/conf/regex-urlfilter.txt.template#L30
> Please let me know how you get on.
> Thanks
> Lewis
>

Reply via email to