[jira] [Created] (NUTCH-985) Problems indexing lastModifiedDate in Solr
Problems indexing lastModifiedDate in Solr -- Key: NUTCH-985 URL: https://issues.apache.org/jira/browse/NUTCH-985 Project: Nutch Issue Type: Bug Components: indexer Reporter: Dietrich Schmidt I am using the index-more plugin to parse the lastModified data in web pages in order to store it in a Solr data field. In solrindex-mapping.xml I am mapping lastModified to a field changed in Solr: field dest=changed source=lastModified/ However, when posting data to Solr the SolrIndexer posts it as a long, not as a date: adddoc boost=1.0field name=changed107932680/fieldfield name=tstamp20110414144140188/fieldfield name=date20040315/field Solr rejects the data because of the improper data type. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-985) Problems indexing lastModifiedDate in Solr
[ https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021712#comment-13021712 ] Markus Jelsma commented on NUTCH-985: - This is similar to another issue described today about the failing dedup. Although i believe it would be a good idea to port longs to properly formatted dates for 1.3 i do think it'll be quite a task since it's not only reformatting before sending it over. Dedup for example relies on dates as long stored in Solr for it to work. I'm also unsure whether a simple reformat in the Solr indexer is a better idea than changing it in the plugins themselves. Thoughts? Problems indexing lastModifiedDate in Solr -- Key: NUTCH-985 URL: https://issues.apache.org/jira/browse/NUTCH-985 Project: Nutch Issue Type: Bug Components: indexer Reporter: Dietrich Schmidt Attachments: indexlastmodifieddate.jar I am using the index-more plugin to parse the lastModified data in web pages in order to store it in a Solr data field. In solrindex-mapping.xml I am mapping lastModified to a field changed in Solr: field dest=changed source=lastModified/ However, when posting data to Solr the SolrIndexer posts it as a long, not as a date: adddoc boost=1.0field name=changed107932680/fieldfield name=tstamp20110414144140188/fieldfield name=date20040315/field Solr rejects the data because of the improper data type. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-985) Problems indexing lastModifiedDate in Solr
[ https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021732#comment-13021732 ] Dietrich Schmidt commented on NUTCH-985: Ideally org.apache.nutch.indexer.more.MoreIndexingFilter should store the lastModifiedDate in date format. Having limited knowledge about the Nutch source, I am not sure whether dependencies exist that would break things by doing that, but at this point I can't see what that would be. Problems indexing lastModifiedDate in Solr -- Key: NUTCH-985 URL: https://issues.apache.org/jira/browse/NUTCH-985 Project: Nutch Issue Type: Bug Components: indexer Reporter: Dietrich Schmidt Attachments: indexlastmodifieddate.jar I am using the index-more plugin to parse the lastModified data in web pages in order to store it in a Solr data field. In solrindex-mapping.xml I am mapping lastModified to a field changed in Solr: field dest=changed source=lastModified/ However, when posting data to Solr the SolrIndexer posts it as a long, not as a date: adddoc boost=1.0field name=changed107932680/fieldfield name=tstamp20110414144140188/fieldfield name=date20040315/field Solr rejects the data because of the improper data type. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-985) Problems indexing lastModifiedDate in Solr
[ https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021739#comment-13021739 ] Markus Jelsma commented on NUTCH-985: - Yes, something has to be done. What did you attach anyway, is that a recompiled plugin with your modification? If so, please include sources. Jar's are not really useful here ;) Anyway, thanks for pointing to this issue Dietrich. Problems indexing lastModifiedDate in Solr -- Key: NUTCH-985 URL: https://issues.apache.org/jira/browse/NUTCH-985 Project: Nutch Issue Type: Bug Components: indexer Reporter: Dietrich Schmidt Attachments: indexlastmodifieddate.jar I am using the index-more plugin to parse the lastModified data in web pages in order to store it in a Solr data field. In solrindex-mapping.xml I am mapping lastModified to a field changed in Solr: field dest=changed source=lastModified/ However, when posting data to Solr the SolrIndexer posts it as a long, not as a date: adddoc boost=1.0field name=changed107932680/fieldfield name=tstamp20110414144140188/fieldfield name=date20040315/field Solr rejects the data because of the improper data type. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Nutch-trunk #1462
See https://hudson.apache.org/hudson/job/Nutch-trunk/1462/ -- [...truncated 1009 lines...] A src/plugin/subcollection/src/java/org/apache/nutch/collection A src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java A src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java A src/plugin/subcollection/src/java/org/apache/nutch/collection/package.html A src/plugin/subcollection/src/java/org/apache/nutch/indexer A src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection A src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/SubcollectionIndexingFilter.java A src/plugin/subcollection/README.txt A src/plugin/subcollection/plugin.xml A src/plugin/subcollection/build.xml A src/plugin/index-more A src/plugin/index-more/ivy.xml A src/plugin/index-more/src A src/plugin/index-more/src/test A src/plugin/index-more/src/test/org A src/plugin/index-more/src/test/org/apache A src/plugin/index-more/src/test/org/apache/nutch A src/plugin/index-more/src/test/org/apache/nutch/indexer A src/plugin/index-more/src/test/org/apache/nutch/indexer/more A src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java A src/plugin/index-more/src/java A src/plugin/index-more/src/java/org A src/plugin/index-more/src/java/org/apache A src/plugin/index-more/src/java/org/apache/nutch A src/plugin/index-more/src/java/org/apache/nutch/indexer A src/plugin/index-more/src/java/org/apache/nutch/indexer/more A src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java A src/plugin/index-more/src/java/org/apache/nutch/indexer/more/package.html A src/plugin/index-more/plugin.xml A src/plugin/index-more/build.xml AUsrc/plugin/plugin.dtd A src/plugin/parse-ext A src/plugin/parse-ext/ivy.xml A src/plugin/parse-ext/src A src/plugin/parse-ext/src/test A src/plugin/parse-ext/src/test/org A src/plugin/parse-ext/src/test/org/apache A src/plugin/parse-ext/src/test/org/apache/nutch A src/plugin/parse-ext/src/test/org/apache/nutch/parse A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java A src/plugin/parse-ext/src/java A src/plugin/parse-ext/src/java/org A src/plugin/parse-ext/src/java/org/apache A src/plugin/parse-ext/src/java/org/apache/nutch A src/plugin/parse-ext/src/java/org/apache/nutch/parse A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java A src/plugin/parse-ext/plugin.xml A src/plugin/parse-ext/build.xml A src/plugin/parse-ext/command A src/plugin/urlnormalizer-pass A src/plugin/urlnormalizer-pass/ivy.xml A src/plugin/urlnormalizer-pass/src A src/plugin/urlnormalizer-pass/src/test A src/plugin/urlnormalizer-pass/src/test/org A src/plugin/urlnormalizer-pass/src/test/org/apache A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass AU src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass/TestPassURLNormalizer.java A src/plugin/urlnormalizer-pass/src/java A src/plugin/urlnormalizer-pass/src/java/org A src/plugin/urlnormalizer-pass/src/java/org/apache A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass AU src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass/PassURLNormalizer.java AUsrc/plugin/urlnormalizer-pass/plugin.xml AUsrc/plugin/urlnormalizer-pass/build.xml A src/plugin/parse-html A src/plugin/parse-html/ivy.xml A src/plugin/parse-html/lib A src/plugin/parse-html/lib/tagsoup.LICENSE.txt A src/plugin/parse-html/src A src/plugin/parse-html/src/test A src/plugin/parse-html/src/test/org A src/plugin/parse-html/src/test/org/apache A src/plugin/parse-html/src/test/org/apache/nutch A src/plugin/parse-html/src/test/org/apache/nutch/parse A