[jira] [Created] (NUTCH-1475) Nutch 2.1 Index-More Plugin -- A better fall back value for date field
James Sullivan created NUTCH-1475: - Summary: Nutch 2.1 Index-More Plugin -- A better fall back value for date field Key: NUTCH-1475 URL: https://issues.apache.org/jira/browse/NUTCH-1475 Project: Nutch Issue Type: Bug Affects Versions: nutchgora, 2.1 Environment: All Reporter: James Sullivan Priority: Minor Attachments: index-more-2x.patch Among other fields, the more plugin for Nutch 2.x provides a last modified and date field for the Solr index. The last modified field is the last modified date from the http headers if available, if not available it is left empty. Currently, the date field is the same as the last modified field unless that field is empty in which case getFetchTime is used as a fall back. I think getFetchTime is not a good fall back as it is the next fetch time and often a month or more in the future which doesn't make sense for the date field. Users do not expect webpages/documents with future dates. A more sensible fallback would be current date at the time it is indexed. This is possible by simply changing line 97 of https://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java from time = page.getFetchTime(); // use fetch time to time = new Date().getTime(); Users interested in the getFetchTime value can still get it from the tstamp field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1475) Nutch 2.1 Index-More Plugin -- A better fall back value for date field
[ https://issues.apache.org/jira/browse/NUTCH-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Sullivan updated NUTCH-1475: -- Attachment: index-more-2x.patch Nutch 2.1 Index-More Plugin -- A better fall back value for date field -- Key: NUTCH-1475 URL: https://issues.apache.org/jira/browse/NUTCH-1475 Project: Nutch Issue Type: Bug Affects Versions: nutchgora, 2.1 Environment: All Reporter: James Sullivan Priority: Minor Labels: index-more, plugins Attachments: index-more-2x.patch Original Estimate: 1h Remaining Estimate: 1h Among other fields, the more plugin for Nutch 2.x provides a last modified and date field for the Solr index. The last modified field is the last modified date from the http headers if available, if not available it is left empty. Currently, the date field is the same as the last modified field unless that field is empty in which case getFetchTime is used as a fall back. I think getFetchTime is not a good fall back as it is the next fetch time and often a month or more in the future which doesn't make sense for the date field. Users do not expect webpages/documents with future dates. A more sensible fallback would be current date at the time it is indexed. This is possible by simply changing line 97 of https://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java from time = page.getFetchTime(); // use fetch time to time = new Date().getTime(); Users interested in the getFetchTime value can still get it from the tstamp field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Nutch-nutchgora #371
See https://builds.apache.org/job/Nutch-nutchgora/371/ -- Started by timer Building remotely on solaris1 in workspace https://builds.apache.org/job/Nutch-nutchgora/ws/ hudson.util.IOException2: remote file operation failed: https://builds.apache.org/job/Nutch-nutchgora/ws/ at hudson.remoting.Channel@1807c204:solaris1 at hudson.FilePath.act(FilePath.java:838) at hudson.FilePath.act(FilePath.java:824) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:743) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:685) at hudson.model.AbstractProject.checkout(AbstractProject.java:1256) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:589) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:494) at hudson.model.Run.execute(Run.java:1502) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) Caused by: java.io.IOException: Remote call on solaris1 failed at hudson.remoting.Channel.call(Channel.java:673) at hudson.FilePath.act(FilePath.java:831) ... 11 more Caused by: java.lang.LinkageError: duplicate class definition: hudson/model/Descriptor at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:621) at java.lang.ClassLoader.defineClass(ClassLoader.java:466) at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:152) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2259) at java.lang.Class.getDeclaredField(Class.java:1852) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1582) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:52) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:408) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.init(ObjectStreamClass.java:400) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:297) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:531) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1699) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) at hudson.remoting.UserRequest.deserialize(UserRequest.java:182) at hudson.remoting.UserRequest.perform(UserRequest.java:98) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:326) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at java.util.concurrent.FutureTask.run(FutureTask.java:123) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651) at
Build failed in Jenkins: Nutch-trunk #1980
See https://builds.apache.org/job/Nutch-trunk/1980/ -- Started by timer Building remotely on solaris1 in workspace https://builds.apache.org/job/Nutch-trunk/ws/ hudson.util.IOException2: remote file operation failed: https://builds.apache.org/job/Nutch-trunk/ws/ at hudson.remoting.Channel@1807c204:solaris1 at hudson.FilePath.act(FilePath.java:838) at hudson.FilePath.act(FilePath.java:824) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:743) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:685) at hudson.model.AbstractProject.checkout(AbstractProject.java:1256) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:589) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:494) at hudson.model.Run.execute(Run.java:1502) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) Caused by: java.io.IOException: Remote call on solaris1 failed at hudson.remoting.Channel.call(Channel.java:673) at hudson.FilePath.act(FilePath.java:831) ... 11 more Caused by: java.lang.LinkageError: duplicate class definition: hudson/model/Descriptor at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:621) at java.lang.ClassLoader.defineClass(ClassLoader.java:466) at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:152) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2259) at java.lang.Class.getDeclaredField(Class.java:1852) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1582) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:52) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:408) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.init(ObjectStreamClass.java:400) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:297) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:531) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1699) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) at hudson.remoting.UserRequest.deserialize(UserRequest.java:182) at hudson.remoting.UserRequest.perform(UserRequest.java:98) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:326) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at java.util.concurrent.FutureTask.run(FutureTask.java:123) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651) at