Re: Nutch ignores robots.txt

2011-11-03 Thread Mathijs Homminga
Hello Max, (Besides the fact that the this client seems to have a broken random URL generator) Crawlers (like Nutch clients) may not always obey robot rules. If Nutch is not configured properly, it will not recognize your Nutch entry in your robots.txt file. If the requests come from a

[jira] [Commented] (NUTCH-1098) better url-normalizer basic

2011-11-03 Thread Julien Nioche (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13143038#comment-13143038 ] Julien Nioche commented on NUTCH-1098: -- @Radim Sounds like I am not going to is your

Re: Build failed in Jenkins: Nutch-trunk #1651

2011-11-03 Thread Lewis John Mcgibbney
Hi, If you look at the recently failing Nutch trunk builds, namely #1645, #1650 #1651, a common denominator is the org.apache.nutch.segment.TestSegmentMerger.testLargeMergehttps://builds.apache.org/job/Nutch-trunk/1651/testReport/org.apache.nutch.segment/TestSegmentMerger/testLargeMerge/ which

Re: Build failed in Jenkins: Nutch-trunk #1651

2011-11-03 Thread Markus Jelsma
DiskCheckerException usually smells like running out of disk space in the designated tmp dir. On Thursday 03 November 2011 12:39:11 Lewis John Mcgibbney wrote: Hi, If you look at the recently failing Nutch trunk builds, namely #1645, #1650 #1651, a common denominator is the

Re: Build failed in Jenkins: Nutch-trunk #1651

2011-11-03 Thread Lewis John Mcgibbney
Would make logical sense Markus, thank you. I think it's about time to try a more generic Jenkins build configuration e.g. build on Ubuntu slaves as well as Solaris. I'll see what we can get running over the next while. On Thu, Nov 3, 2011 at 11:43 AM, Markus Jelsma

Re: Build failed in Jenkins: Nutch-trunk #1651

2011-11-03 Thread Ferdy Galema
I can't tell what the exact cause is. Because tests run locally fine and because the commits since last build succes seem completely unrelated, I would say yes this is definitely caused by the Solaris build invironment. Unfortunately I'm still a novice in regard to the build process so I'm not

developer subscription

2011-11-03 Thread Samata Sirsikar
Hello, I would like to be subscribed to the nutch developers list.

[jira] [Created] (NUTCH-1196) Update job should impose an upper limit on the number of inlinks (nutchgora)

2011-11-03 Thread Ferdy Galema (Created) (JIRA)
Update job should impose an upper limit on the number of inlinks (nutchgora) Key: NUTCH-1196 URL: https://issues.apache.org/jira/browse/NUTCH-1196 Project: Nutch

Running Issue about Nutch 1.3

2011-11-03 Thread Skiming_Zhang
Hello dear : I have the following running information from hadoop.log when I configured Nutch 1.3 in Eclipse (Win 7), but I don't know how to resolve it ,Can you help me . I'm new to nutch , so forgive me for some mistakes of using wrong terminology! 2011-11-03 16:51:53,300

[jira] [Updated] (NUTCH-1140) index-more plugin, resetTitle method creates multiple values in the Title field

2011-11-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1140: - Fix Version/s: 1.5 index-more plugin, resetTitle method creates multiple values in the

Re: Running Issue about Nutch 1.3

2011-11-03 Thread Markus Jelsma
Hi Please use the user@nutch mailing list for user-related questions. This is for development of Nutch itself. Cheers Hello dear : I have the following running information from hadoop.log when I configured Nutch 1.3 in Eclipse (Win 7), but I don't know how to resolve

[jira] [Commented] (NUTCH-1140) index-more plugin, resetTitle method creates multiple values in the Title field

2011-11-03 Thread Joe Liedtke (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13143503#comment-13143503 ] Joe Liedtke commented on NUTCH-1140: Thanks! index-more plugin,

[jira] [Resolved] (NUTCH-1195) Add Solr 4x (trunk) example schema

2011-11-03 Thread Andrzej Bialecki (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-1195. -- Resolution: Fixed Committed in rev. 1197319. Add Solr 4x (trunk)

[jira] [Updated] (NUTCH-1194) CrawlDB lock should be released earlier

2011-11-03 Thread Radim Kolar (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated NUTCH-1194: --- Comment: was deleted (was: locking should be done in setup/cleanup task. Currently if you kill

[jira] [Updated] (NUTCH-1070) Run nutch under native windows (no cygwin)

2011-11-03 Thread Radim Kolar (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated NUTCH-1070: --- Attachment: (was: nutch.bat) Run nutch under native windows (no cygwin)

[jira] [Resolved] (NUTCH-1070) Run nutch under native windows (no cygwin)

2011-11-03 Thread Radim Kolar (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar resolved NUTCH-1070. Resolution: Won't Fix Run nutch under native windows (no cygwin)

[jira] [Updated] (NUTCH-1070) Run nutch under native windows (no cygwin)

2011-11-03 Thread Radim Kolar (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated NUTCH-1070: --- Attachment: (was: bash.c) Run nutch under native windows (no cygwin)

[jira] [Updated] (NUTCH-1070) Run nutch under native windows (no cygwin)

2011-11-03 Thread Radim Kolar (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated NUTCH-1070: --- Attachment: (was: chmod.c) Run nutch under native windows (no cygwin)

Re: Setting properties in gora.properties

2011-11-03 Thread Enis Söztutar
Hi Lewis, I guess in gora-cassandra/src/test/conf/gora.properties, the servers are listed as: gora.cassandrastore.servers=localhost:9160 In setting the properties for gora data stores, you have to supply the data store that it applies to. The documentation at

[jira] [Created] (NUTCH-1197) Add statically configured field values to solrindex-mapping.xml

2011-11-03 Thread Andrzej Bialecki (Created) (JIRA)
Add statically configured field values to solrindex-mapping.xml --- Key: NUTCH-1197 URL: https://issues.apache.org/jira/browse/NUTCH-1197 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-1197) Add statically configured field values to solrindex-mapping.xml

2011-11-03 Thread Andrzej Bialecki (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-1197: - Attachment: NUTCH-1197.patch Patch with the implementation. I added some javadocs, and a

The old search page?

2011-11-03 Thread John Whelan
I’ve been ‘out of it’ for a while. It used to be that Nutch has a localized HTML search page that featured these guyshttp://upload.wikimedia.org/wikipedia/commons/5/53/Nutch.png. Did 1.3 bring this forward in some form that I cannot find (maybe involving an XSL on search results?), or has this

Jenkins build is back to normal : Nutch-trunk #1652

2011-11-03 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/1652/changes