Re: [WELCOME] Nutch PMC Welcomes Talat Uyarer to PMC and Committer

2014-04-02 Thread Talat Uyarer
Hi All, I am very excited now. :) Thanks a lot to everyone for inviting me. I'm a software engineer and crawler team leader of my company in Istanbul. I have been using Apache Nutch 2.X for 10 months. My company works on Search Technologies. We have a huge Hadoop cluster for crawling and

Re: [WELCOME] Nutch PMC Welcomes Talat Uyarer to PMC and Committer

2014-04-02 Thread Renato Marroquín Mogrovejo
Congrats Talat! Well deserved! Renato M. 2014-04-02 8:56 GMT+02:00 Talat Uyarer ta...@uyarer.com: Hi All, I am very excited now. :) Thanks a lot to everyone for inviting me. I'm a software engineer and crawler team leader of my company in Istanbul. I have been using Apache Nutch 2.X for

Re: [WELCOME] Nutch PMC Welcomes Talat Uyarer to PMC and Committer

2014-04-02 Thread Alparslan Avcı
Congratulationsmate! With the contributions of this great team, I can see the brillant future of Nutch 2.x from now :) Alparslan On 02-04-2014 09:57, Renato Marroquín Mogrovejo wrote: Congrats Talat! Well deserved! Renato M. 2014-04-02 8:56 GMT+02:00 Talat Uyarer ta...@uyarer.com: Hi

Re: [WELCOME] Nutch PMC Welcomes Talat Uyarer to PMC and Committer

2014-04-02 Thread Julien Nioche
Congratulations Talat and welcome on board! Julien On 2 April 2014 07:56, Talat Uyarer ta...@uyarer.com wrote: Hi All, I am very excited now. :) Thanks a lot to everyone for inviting me. I'm a software engineer and crawler team leader of my company in Istanbul. I have been using Apache

InjectorJob: org.apache.gora.util.GoraException: java.lang.RuntimeException: java.lang.IllegalArgumentException...

2014-04-02 Thread Adamantios Corais
Hi all, I have followed all steps to set-up Nutch (2.2.1) with HBase (0.90.4) and Solr (4.7.1) as described in the book Web Crawling and Data Mining with Apache Nutch, however, I am getting the following error: InjectorJob: org.apache.gora.util.GoraException: java.lang.RuntimeException:

One site only index.

2014-04-02 Thread Shane Wood
I have indexed several site successfully. Now i wish too index a new site and not update any other sites already indexed. I use Nutch 2.21 MYSQL 5.3 and Solr 4.7.0 how would you recommend i go about indexing a new site only if someone can give examples of command lines that would be

Re: One site only index.

2014-04-02 Thread remi tassing
Hi Shane, You could use the same scripts as before but just modify the regex-urlfilter.txt to restrict the crawling scope. BR, Remi On Thu, Apr 3, 2014 at 10:52 AM, Shane Wood sh...@cbm8bit.com wrote: I have indexed several site successfully. Now i wish too index a new site and not update

Re: One site only index.

2014-04-02 Thread Shane Wood
Can you choose a custom regex-urlfilter.txt too save editing it each time you wish too index a different site ?. I am surprised you can't enter a url when generating a fetch list. ie /bin/nutch generate --only someurl.com --job 192833-292837 The you fetch job 192833-292837 parse job

Unable to crawl wiki pages through Nutch

2014-04-02 Thread reddibabu
Hi All, I am using Apache Nutch 1.7. I can able to crawl and index all most all sites except wiki pages. While trying to crawl wiki pages it is saying that fetch of http://wiki.ibm.com/ failed with: java.net.UnknownHostException: wiki.ibm.com. Is it require any additional configuration for

Re: Unable to crawl wiki pages through Nutch

2014-04-02 Thread Talat Uyarer
Hi reddibabu, java.net.UnknownHostException is dns resolve problem. When i try to enter website it didnt open. Nutch has not any specific confugration for wiki. Talat 3 Nis 2014 07:54 tarihinde reddibabu reddybabu...@gmail.com yazdı: Hi All, I am using Apache Nutch 1.7. I can able to crawl

Re: Unable to crawl wiki pages through Nutch

2014-04-02 Thread John Lafitte
reddibabu, I cannot resolve wiki.ibm.com so I'm guessing nutch can't either. Is that an internal dns record? On Wed, Apr 2, 2014 at 11:54 PM, reddibabu reddybabu...@gmail.com wrote: Hi All, I am using Apache Nutch 1.7. I can able to crawl and index all most all sites except wiki pages.

Re: Unable to crawl wiki pages through Nutch

2014-04-02 Thread Talat Uyarer
If you use local dns server for resolving, you should write nameservers in resolv.conf which nutch working servers. You should be sure nutch's server can resolve this site. If you use console you ycan use lynx for checking Talat 3 Nis 2014 08:02 tarihinde John Lafitte jlafi...@brandextract.com