Hi,
sorry to trouble u. i saw nutch has new released 1.11 RC ,but as  u
said,you using nutch version 2.2.1 to crawl. its make me confuse , its that
mistake??

Thanks
Chear



On Fri, Dec 4, 2015 at 7:16 PM, Baizhang Ma <[email protected]> wrote:

> Hi, Lewis and Madhav. Thanks for your reply!
> I just found that it did crawl images. Maybe my careless to notice that
> they have been stored in my database. Sorry about that. As for other
> properties set in nutch-site.xml. Since my nutch version is 2.2.1, which
> can compatible with MySql database.
>
> >     <property>
> >         <name>storage.data.store.class</name>
> >         <value>org.apache.gora.sql.store.SqlStore</value>
> >         <description>Default class for storing data</description>
> >     </property>
>
> It is used to deploy mysql database. And if you want to run nutch in
> Eclipse, you might set
>
> >     <property>
> >     <name>generate.batch.id</name>
> >     <value>*</value>
> >     </property>
>
> for some error would happen if you haven't deploied this. FYI
> http://www.solutions.asia/?p=180.
> By the way, the setting
> >     <property>
> >     <name>generate.batch.id</name>
> >     <value>*</value>
> >     </property>
> is useful when you use Eclipse to run nutch 2.2.1 + mysql.
>
>
> 2015-12-04 14:56 GMT+08:00 Madhav Sharan <[email protected]>:
>
> > Yeah as Lewis said - check your "regex-urlfilter.txt" file. By default
> > images are filtered out you can comment out
> >
> >
> https://github.com/apache/nutch/blob/2.x/conf/regex-urlfilter.txt.template#L30
> >
> >
> > --
> > Thanks
> > Madhav Sharan
> >
> >
> > On Thu, Dec 3, 2015 at 9:17 PM, Lewis John Mcgibbney <
> > [email protected]> wrote:
> >
> > > Hi Byzen.Ma,
> > >
> > > I would advise you to follow the tutorial at
> > > http://wiki.apache.org/nutch/Nutch2Tutorial
> > > Please see the answers inline
> > >
> > > On Thu, Dec 3, 2015 at 6:26 AM, <[email protected]>
> > wrote:
> > >
> > > >
> > > >     <property>
> > > >         <name>storage.data.store.class</name>
> > > >         <value>org.apache.gora.sql.store.SqlStore</value>
> > > >         <description>Default class for storing data</description>
> > > >     </property>
> > > >
> > >
> > > This property is useless. SqlStore is non functional. Please consider
> > using
> > > one of the other datastores documented at
> > > http://gora.apache.org/current/index.html#gora-modules
> > >
> > >
> > > >     <property>
> > > >     <name>generate.batch.id</name>
> > > >     <value>*</value>
> > > >     </property>
> > > >
> > >
> > > i have no idea what the above property is...
> > >
> > >
> > > > If you have any idea about this,would very appreciate to share with
> me!
> > > > Thanks again, Madhav.
> > > >
> > > >
> > > Nutch 2.X, as with Nutch 1.X will crawl images if you comment out the
> > > folllowing lines in your regex-urlfilter.txt
> > >
> > >
> >
> https://github.com/apache/nutch/blob/2.x/conf/regex-urlfilter.txt.template#L30
> > > Please let me know how you get on.
> > > Thanks
> > > Lewis
> > >
> >
>

Reply via email to