Build failed in Jenkins: Nutch-nutchgora #403
See https://builds.apache.org/job/Nutch-nutchgora/403/ -- Started by timer Building remotely on solaris1 in workspace https://builds.apache.org/job/Nutch-nutchgora/ws/ hudson.util.IOException2: remote file operation failed: https://builds.apache.org/job/Nutch-nutchgora/ws/ at hudson.remoting.Channel@1ea860fb:solaris1 at hudson.FilePath.act(FilePath.java:838) at hudson.FilePath.act(FilePath.java:824) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:743) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:685) at hudson.model.AbstractProject.checkout(AbstractProject.java:1256) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:589) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:494) at hudson.model.Run.execute(Run.java:1502) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) Caused by: java.io.IOException: Remote call on solaris1 failed at hudson.remoting.Channel.call(Channel.java:673) at hudson.FilePath.act(FilePath.java:831) ... 11 more Caused by: java.lang.LinkageError: duplicate class definition: hudson/model/Descriptor at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:621) at java.lang.ClassLoader.defineClass(ClassLoader.java:466) at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:152) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2259) at java.lang.Class.getDeclaredField(Class.java:1852) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1582) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:52) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:408) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.init(ObjectStreamClass.java:400) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:297) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:531) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1699) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) at hudson.remoting.UserRequest.deserialize(UserRequest.java:182) at hudson.remoting.UserRequest.perform(UserRequest.java:98) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:326) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at java.util.concurrent.FutureTask.run(FutureTask.java:123) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651) at
Build failed in Jenkins: Nutch-trunk #2013
See https://builds.apache.org/job/Nutch-trunk/2013/ -- Started by timer Building remotely on solaris1 in workspace https://builds.apache.org/job/Nutch-trunk/ws/ hudson.util.IOException2: remote file operation failed: https://builds.apache.org/job/Nutch-trunk/ws/ at hudson.remoting.Channel@1ea860fb:solaris1 at hudson.FilePath.act(FilePath.java:838) at hudson.FilePath.act(FilePath.java:824) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:743) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:685) at hudson.model.AbstractProject.checkout(AbstractProject.java:1256) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:589) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:494) at hudson.model.Run.execute(Run.java:1502) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) Caused by: java.io.IOException: Remote call on solaris1 failed at hudson.remoting.Channel.call(Channel.java:673) at hudson.FilePath.act(FilePath.java:831) ... 11 more Caused by: java.lang.LinkageError: duplicate class definition: hudson/model/Descriptor at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:621) at java.lang.ClassLoader.defineClass(ClassLoader.java:466) at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:152) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2259) at java.lang.Class.getDeclaredField(Class.java:1852) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1582) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:52) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:408) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.init(ObjectStreamClass.java:400) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:297) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:531) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1699) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) at hudson.remoting.UserRequest.deserialize(UserRequest.java:182) at hudson.remoting.UserRequest.perform(UserRequest.java:98) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:326) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at java.util.concurrent.FutureTask.run(FutureTask.java:123) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651) at
Build failed in Jenkins: nutch-trunk-maven #492
See https://builds.apache.org/job/nutch-trunk-maven/492/ -- Started by timer Building remotely on solaris1 in workspace https://builds.apache.org/job/nutch-trunk-maven/ws/ hudson.util.IOException2: remote file operation failed: https://builds.apache.org/job/nutch-trunk-maven/ws/ at hudson.remoting.Channel@1ea860fb:solaris1 at hudson.FilePath.act(FilePath.java:838) at hudson.FilePath.act(FilePath.java:824) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:743) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:685) at hudson.model.AbstractProject.checkout(AbstractProject.java:1256) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:589) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:494) at hudson.model.Run.execute(Run.java:1502) at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:477) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) Caused by: java.io.IOException: Remote call on solaris1 failed at hudson.remoting.Channel.call(Channel.java:673) at hudson.FilePath.act(FilePath.java:831) ... 11 more Caused by: java.lang.LinkageError: duplicate class definition: hudson/model/Descriptor at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:621) at java.lang.ClassLoader.defineClass(ClassLoader.java:466) at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:152) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2259) at java.lang.Class.getDeclaredField(Class.java:1852) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1582) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:52) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:408) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.init(ObjectStreamClass.java:400) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:297) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:531) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1699) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) at hudson.remoting.UserRequest.deserialize(UserRequest.java:182) at hudson.remoting.UserRequest.perform(UserRequest.java:98) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:326) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at java.util.concurrent.FutureTask.run(FutureTask.java:123) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651) at
[jira] [Updated] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL
[ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Sullivan updated NUTCH-1497: -- Patch Info: (was: Patch Available) Better default gora-sql-mapping.xml with larger field sizes for MySQL - Key: NUTCH-1497 URL: https://issues.apache.org/jira/browse/NUTCH-1497 Project: Nutch Issue Type: Improvement Components: storage Affects Versions: 2.2 Environment: MySQL Backend Reporter: James Sullivan Priority: Minor Labels: MySQL Attachments: gora-mysql-mapping.xml The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL
[ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Sullivan updated NUTCH-1497: -- Attachment: gora-mysql-mapping.xml Better default gora-sql-mapping.xml with larger field sizes for MySQL - Key: NUTCH-1497 URL: https://issues.apache.org/jira/browse/NUTCH-1497 Project: Nutch Issue Type: Improvement Components: storage Affects Versions: 2.2 Environment: MySQL Backend Reporter: James Sullivan Priority: Minor Labels: MySQL Attachments: gora-mysql-mapping.xml, gora-mysql-mapping.xml The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL
[ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13496122#comment-13496122 ] James Sullivan commented on NUTCH-1497: --- Nathan I've made the changes to the lengths and uploaded. Could you check it is correct. One note I left the column as typ, as although I agree it is odd, I thought consistency was more important. Better default gora-sql-mapping.xml with larger field sizes for MySQL - Key: NUTCH-1497 URL: https://issues.apache.org/jira/browse/NUTCH-1497 Project: Nutch Issue Type: Improvement Components: storage Affects Versions: 2.2 Environment: MySQL Backend Reporter: James Sullivan Priority: Minor Labels: MySQL Attachments: gora-mysql-mapping.xml, gora-mysql-mapping.xml The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1497) Better default gora-sql-mapping.xml with larger field sizes for MySQL
[ https://issues.apache.org/jira/browse/NUTCH-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13496131#comment-13496131 ] James Sullivan commented on NUTCH-1497: --- I agree one standard file for SQL databases would be preferable but one example of why I couldn't stay with one file for both hsql and MySQL is the text column was being turned into a blob, not text at larger sizes. Better default gora-sql-mapping.xml with larger field sizes for MySQL - Key: NUTCH-1497 URL: https://issues.apache.org/jira/browse/NUTCH-1497 Project: Nutch Issue Type: Improvement Components: storage Affects Versions: 2.2 Environment: MySQL Backend Reporter: James Sullivan Priority: Minor Labels: MySQL Attachments: gora-mysql-mapping.xml, gora-mysql-mapping.xml The current generic default gora-sql-mapping.xml has field sizes that are too small in almost all situations when used with MySQL. I have included a mapping which will work better for MySQL (takes slightly more space but will be able to handle larger fields necessary for real world use). Includes patch from Nutch-1490 and resolves the non-Unicode part of Nutch-1473. I believe it is not possible to use the same gora-sql-mapping for both hsqldb and MySQL without a significantly degraded lowest common denominator resulting. Should the user manually rename the attached file to gora-sql-mapping.xml or is there a way to have Nutch automatically use it when MySQL is selected in other configurations (Ivy.xml or gora.properties)? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1495) -normalize and -filter for updatedb command in nutch 2.x
[ https://issues.apache.org/jira/browse/NUTCH-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Gass updated NUTCH-1495: --- Attachment: patch-updatedb-normalize-filter-2012-11-13.txt The attached patch shows where I'm currently standing. normalize basically works and possible duplicate entries are handled similar to nutch 1.x (by taking the newest one). I'm not at all sure if this is enough/the best approach. Currently fields like baseUrl are not changed. Should DbUpdater try to adapt them to the new url (by doing the same normalizations)? What about the fetched content? Another approach could be to add a new empty entry, so updatedb -normalize would actually throw away already fetched and/or parsed content of urls with new normalizations. More testing is also necessary, but I'm waiting for comments if this approach is at all feasible before I continue working on this. -normalize and -filter for updatedb command in nutch 2.x Key: NUTCH-1495 URL: https://issues.apache.org/jira/browse/NUTCH-1495 Project: Nutch Issue Type: Improvement Affects Versions: 2.2 Reporter: Nathan Gass Attachments: patch-updatedb-normalize-filter-2012-11-09.txt, patch-updatedb-normalize-filter-2012-11-13.txt AFAIS in nutch 1.x you could change your url filters and normalizers during the crawl, and update the db using crawldb -normalize -filter. There does not seem to be a away to achieve the same in nutch 2.x? Anyway, I went ahead and tried to implement -normalize and -filter for the nutch 2.x updatedb command. I have no experience with any of the used technologies including java, so please check the attached code carefully before using it. I'm very interested to hear if this is the right approach or any other comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1370) Expose exact number of urls injected @runtime
[ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1370: Attachment: NUTCH-1370-2.x-v2.patch 2nd WIP for 2.x I'm having difficulty correctly implementing JobClient#runJob as the currentJob param is not correct... {code} RunningJob mapJob = JobClient.runJob(currentJob); {code} @Seb, Regarding your patch, this looks great, is much cleaner than my proposal, I've tested and I'm +1 for committing. Expose exact number of urls injected @runtime -- Key: NUTCH-1370 URL: https://issues.apache.org/jira/browse/NUTCH-1370 Project: Nutch Issue Type: Improvement Components: injector Affects Versions: nutchgora, 1.5 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Minor Fix For: 1.6, 2.2 Attachments: NUTCH-1370-1.x.patch, NUTCH-1370-2.x.patch, NUTCH-1370-2.x-v2.patch Example: When using trunk, currently we see {code} 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: starting at 2012-05-22 09:04:00 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: urlDir: urls 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries. 2012-05-22 09:04:00,955 INFO plugin.PluginRepository - Plugins: looking in: {code} I would like to see {code} 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: starting at 2012-05-22 09:04:00 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: urlDir: urls 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Injected N urls to crawl/crawldb 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries. 2012-05-22 09:04:00,955 INFO plugin.PluginRepository - Plugins: looking in: {code} This would make debugging easier and would help those who end up getting {code} 2012-05-22 09:04:04,850 WARN crawl.Generator - Generator: 0 records selected for fetching, exiting ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1370) Expose exact number of urls injected @runtime
[ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1370: Patch Info: Patch Available Expose exact number of urls injected @runtime -- Key: NUTCH-1370 URL: https://issues.apache.org/jira/browse/NUTCH-1370 Project: Nutch Issue Type: Improvement Components: injector Affects Versions: nutchgora, 1.5 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Minor Fix For: 1.6, 2.2 Attachments: NUTCH-1370-1.x.patch, NUTCH-1370-2.x.patch, NUTCH-1370-2.x-v2.patch Example: When using trunk, currently we see {code} 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: starting at 2012-05-22 09:04:00 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: urlDir: urls 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries. 2012-05-22 09:04:00,955 INFO plugin.PluginRepository - Plugins: looking in: {code} I would like to see {code} 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: starting at 2012-05-22 09:04:00 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: urlDir: urls 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Injected N urls to crawl/crawldb 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries. 2012-05-22 09:04:00,955 INFO plugin.PluginRepository - Plugins: looking in: {code} This would make debugging easier and would help those who end up getting {code} 2012-05-22 09:04:04,850 WARN crawl.Generator - Generator: 0 records selected for fetching, exiting ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1117) JUnit test for index-anchor
[ https://issues.apache.org/jira/browse/NUTCH-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1117: Attachment: NUTCH-1117.patch Trivial patch fro tests case. Thank you to both Ferdy Markus for the info on manually simulating Inlinks insertion. JUnit test for index-anchor --- Key: NUTCH-1117 URL: https://issues.apache.org/jira/browse/NUTCH-1117 Project: Nutch Issue Type: Sub-task Components: build Affects Versions: 1.4 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Minor Fix For: 1.6 Attachments: NUTCH-1117.patch This issue is part of the larger attempt to provide a Junit test case for every Nutch plugin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1117) JUnit test for index-anchor
[ https://issues.apache.org/jira/browse/NUTCH-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1117. - Resolution: Fixed Committed @revision 1408898 in trunk JUnit test for index-anchor --- Key: NUTCH-1117 URL: https://issues.apache.org/jira/browse/NUTCH-1117 Project: Nutch Issue Type: Sub-task Components: build Affects Versions: 1.4 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Minor Fix For: 1.6 Attachments: NUTCH-1117.patch This issue is part of the larger attempt to provide a Junit test case for every Nutch plugin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (NUTCH-1498) Make index-basic consistent in trunk and 2.x
Lewis John McGibbney created NUTCH-1498: --- Summary: Make index-basic consistent in trunk and 2.x Key: NUTCH-1498 URL: https://issues.apache.org/jira/browse/NUTCH-1498 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 2.2 Reporter: Lewis John McGibbney Priority: Minor Fix For: 2.2 Currently the index-basic plugin supports more functionality in trunk than it does in 2.x. I see no reason why functionality shouldn't be made consistent. For example - 2.x duplicates field values for host and site... - trunk supports configuration options for indexer.add.domain and indexer.max.content.length whereas 2.x does not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: nutch-trunk-maven #493
See https://builds.apache.org/job/nutch-trunk-maven/493/ -- [...truncated 1190 lines...] AU src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java AU src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/package.html A src/plugin/protocol-httpclient/jsp AUsrc/plugin/protocol-httpclient/jsp/ntlm.jsp AUsrc/plugin/protocol-httpclient/jsp/cookies.jsp AUsrc/plugin/protocol-httpclient/jsp/noauth.jsp AUsrc/plugin/protocol-httpclient/jsp/digest.jsp AUsrc/plugin/protocol-httpclient/jsp/basic.jsp AUsrc/plugin/protocol-httpclient/plugin.xml AUsrc/plugin/protocol-httpclient/build.xml A src/plugin/parse-metatags A src/plugin/parse-metatags/sample A src/plugin/parse-metatags/sample/testMetatags.html A src/plugin/parse-metatags/ivy.xml A src/plugin/parse-metatags/src A src/plugin/parse-metatags/src/test A src/plugin/parse-metatags/src/test/org A src/plugin/parse-metatags/src/test/org/apache A src/plugin/parse-metatags/src/test/org/apache/nutch A src/plugin/parse-metatags/src/test/org/apache/nutch/parse A src/plugin/parse-metatags/src/test/org/apache/nutch/parse/html A src/plugin/parse-metatags/src/test/org/apache/nutch/parse/html/TestMetatagParser.java A src/plugin/parse-metatags/src/java A src/plugin/parse-metatags/src/java/org A src/plugin/parse-metatags/src/java/org/apache A src/plugin/parse-metatags/src/java/org/apache/nutch A src/plugin/parse-metatags/src/java/org/apache/nutch/parse A src/plugin/parse-metatags/src/java/org/apache/nutch/parse/MetaTagsParser.java A src/plugin/parse-metatags/README.txt A src/plugin/parse-metatags/plugin.xml A src/plugin/parse-metatags/build.xml A src/plugin/urlfilter-domain A src/plugin/urlfilter-domain/ivy.xml A src/plugin/urlfilter-domain/src A src/plugin/urlfilter-domain/src/test A src/plugin/urlfilter-domain/src/test/org A src/plugin/urlfilter-domain/src/test/org/apache A src/plugin/urlfilter-domain/src/test/org/apache/nutch A src/plugin/urlfilter-domain/src/test/org/apache/nutch/urlfilter A src/plugin/urlfilter-domain/src/test/org/apache/nutch/urlfilter/domain AU src/plugin/urlfilter-domain/src/test/org/apache/nutch/urlfilter/domain/TestDomainURLFilter.java A src/plugin/urlfilter-domain/src/java A src/plugin/urlfilter-domain/src/java/org A src/plugin/urlfilter-domain/src/java/org/apache A src/plugin/urlfilter-domain/src/java/org/apache/nutch A src/plugin/urlfilter-domain/src/java/org/apache/nutch/urlfilter A src/plugin/urlfilter-domain/src/java/org/apache/nutch/urlfilter/domain AU src/plugin/urlfilter-domain/src/java/org/apache/nutch/urlfilter/domain/DomainURLFilter.java AU src/plugin/urlfilter-domain/src/java/org/apache/nutch/urlfilter/domain/package.html A src/plugin/urlfilter-domain/data AUsrc/plugin/urlfilter-domain/data/hosts.txt AUsrc/plugin/urlfilter-domain/plugin.xml AUsrc/plugin/urlfilter-domain/build.xml A src/plugin/protocol-http A src/plugin/protocol-http/ivy.xml A src/plugin/protocol-http/src A src/plugin/protocol-http/src/test A src/plugin/protocol-http/src/test/org A src/plugin/protocol-http/src/test/org/apache A src/plugin/protocol-http/src/test/org/apache/nutch A src/plugin/protocol-http/src/test/org/apache/nutch/protocol A src/plugin/protocol-http/src/test/org/apache/nutch/protocol/http A src/plugin/protocol-http/src/java A src/plugin/protocol-http/src/java/org A src/plugin/protocol-http/src/java/org/apache A src/plugin/protocol-http/src/java/org/apache/nutch A src/plugin/protocol-http/src/java/org/apache/nutch/protocol A src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http AU src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/Http.java AU src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java AU src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/package.html AUsrc/plugin/protocol-http/plugin.xml AUsrc/plugin/protocol-http/build.xml A pom.xml A KEYS AUREADME.txt AUbuild.xml U. At revision 1408944 no revision recorded for https://svn.apache.org/repos/asf/nutch/trunk in the previous build Parsing POMs [trunk] $ /home/hudson/tools/java/latest1.6/bin/java -cp /export/home/hudson/hudson-slave/maven-agent.jar:/export/home/hudson/hudson-slave/classworlds.jar hudson.maven.agent.Main /home/hudson/tools/maven/latest /zonestorage/hudson_solaris/home/hudson/hudson-slave/slave.jar
[jira] [Updated] (NUTCH-1370) Expose exact number of urls injected @runtime
[ https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1370: --- Attachment: NUTCH-1370-2.x-v3.patch Hi Lewis, yes, the 1.x patch is not easily transferred for 2.x because of different (old vs. new) map reduce APIs. Here is a trial... One question: the logged line number of urls attempting to inject suggests that there is a third count urls successfully injected or similar. What's the intention with attempting? Expose exact number of urls injected @runtime -- Key: NUTCH-1370 URL: https://issues.apache.org/jira/browse/NUTCH-1370 Project: Nutch Issue Type: Improvement Components: injector Affects Versions: nutchgora, 1.5 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Minor Fix For: 1.6, 2.2 Attachments: NUTCH-1370-1.x.patch, NUTCH-1370-2.x.patch, NUTCH-1370-2.x-v2.patch, NUTCH-1370-2.x-v3.patch Example: When using trunk, currently we see {code} 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: starting at 2012-05-22 09:04:00 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: urlDir: urls 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries. 2012-05-22 09:04:00,955 INFO plugin.PluginRepository - Plugins: looking in: {code} I would like to see {code} 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: starting at 2012-05-22 09:04:00 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: crawlDb: crawl/crawldb 2012-05-22 09:04:00,239 INFO crawl.Injector - Injector: urlDir: urls 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Injected N urls to crawl/crawldb 2012-05-22 09:04:00,253 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries. 2012-05-22 09:04:00,955 INFO plugin.PluginRepository - Plugins: looking in: {code} This would make debugging easier and would help those who end up getting {code} 2012-05-22 09:04:04,850 WARN crawl.Generator - Generator: 0 records selected for fetching, exiting ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira