Hi, sorry to trouble u. i saw nutch has new released 1.11 RC ,but as u said,you using nutch version 2.2.1 to crawl. its make me confuse , its that mistake??
Thanks Chear On Fri, Dec 4, 2015 at 7:16 PM, Baizhang Ma <[email protected]> wrote: > Hi, Lewis and Madhav. Thanks for your reply! > I just found that it did crawl images. Maybe my careless to notice that > they have been stored in my database. Sorry about that. As for other > properties set in nutch-site.xml. Since my nutch version is 2.2.1, which > can compatible with MySql database. > > > <property> > > <name>storage.data.store.class</name> > > <value>org.apache.gora.sql.store.SqlStore</value> > > <description>Default class for storing data</description> > > </property> > > It is used to deploy mysql database. And if you want to run nutch in > Eclipse, you might set > > > <property> > > <name>generate.batch.id</name> > > <value>*</value> > > </property> > > for some error would happen if you haven't deploied this. FYI > http://www.solutions.asia/?p=180. > By the way, the setting > > <property> > > <name>generate.batch.id</name> > > <value>*</value> > > </property> > is useful when you use Eclipse to run nutch 2.2.1 + mysql. > > > 2015-12-04 14:56 GMT+08:00 Madhav Sharan <[email protected]>: > > > Yeah as Lewis said - check your "regex-urlfilter.txt" file. By default > > images are filtered out you can comment out > > > > > https://github.com/apache/nutch/blob/2.x/conf/regex-urlfilter.txt.template#L30 > > > > > > -- > > Thanks > > Madhav Sharan > > > > > > On Thu, Dec 3, 2015 at 9:17 PM, Lewis John Mcgibbney < > > [email protected]> wrote: > > > > > Hi Byzen.Ma, > > > > > > I would advise you to follow the tutorial at > > > http://wiki.apache.org/nutch/Nutch2Tutorial > > > Please see the answers inline > > > > > > On Thu, Dec 3, 2015 at 6:26 AM, <[email protected]> > > wrote: > > > > > > > > > > > <property> > > > > <name>storage.data.store.class</name> > > > > <value>org.apache.gora.sql.store.SqlStore</value> > > > > <description>Default class for storing data</description> > > > > </property> > > > > > > > > > > This property is useless. SqlStore is non functional. Please consider > > using > > > one of the other datastores documented at > > > http://gora.apache.org/current/index.html#gora-modules > > > > > > > > > > <property> > > > > <name>generate.batch.id</name> > > > > <value>*</value> > > > > </property> > > > > > > > > > > i have no idea what the above property is... > > > > > > > > > > If you have any idea about this,would very appreciate to share with > me! > > > > Thanks again, Madhav. > > > > > > > > > > > Nutch 2.X, as with Nutch 1.X will crawl images if you comment out the > > > folllowing lines in your regex-urlfilter.txt > > > > > > > > > https://github.com/apache/nutch/blob/2.x/conf/regex-urlfilter.txt.template#L30 > > > Please let me know how you get on. > > > Thanks > > > Lewis > > > > > >

