[jira] Commented: (NUTCH-940) static field plugin
[ https://issues.apache.org/jira/browse/NUTCH-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981274#action_12981274 ] Julien Nioche commented on NUTCH-940: - Claudio, It would be better to follow the implicit convention for naming the plugins and call it index-static for instance. This will be a better indication of what the plugin does. Would be better to be able to specify multiple values for a field as well i.e have a MapString,String[] Julien static field plugin --- Key: NUTCH-940 URL: https://issues.apache.org/jira/browse/NUTCH-940 Project: Nutch Issue Type: New Feature Components: indexer Affects Versions: 1.3, 2.0 Reporter: Claudio Martella Priority: Minor Attachments: static-field.diff, static-field.tar.gz A simple plugin called at indexing that adds fields with static data. You can specify a list of fieldname:fieldcontent per nutch job. It can be useful when collections can't be created by urlpatterns, like in subcollection, but on a job-basis. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-940) static field plugin
[ https://issues.apache.org/jira/browse/NUTCH-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Claudio Martella updated NUTCH-940: --- Attachment: index-static.diff changed naming conventions from static-field to index-static static field plugin --- Key: NUTCH-940 URL: https://issues.apache.org/jira/browse/NUTCH-940 Project: Nutch Issue Type: New Feature Components: indexer Affects Versions: 1.3, 2.0 Reporter: Claudio Martella Priority: Minor Attachments: index-static.diff, static-field.diff, static-field.tar.gz A simple plugin called at indexing that adds fields with static data. You can specify a list of fieldname:fieldcontent per nutch job. It can be useful when collections can't be created by urlpatterns, like in subcollection, but on a job-basis. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-940) static field plugin
[ https://issues.apache.org/jira/browse/NUTCH-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981280#action_12981280 ] Claudio Martella commented on NUTCH-940: About the multiple values, i split on commas and on colons, so values can already have multiple tokens with spaces. They will not be divided in the map, but does it make a difference at indexing time? i.e. this is reasonable: field1:value1.1 value1.2 value1.3,field2:value2.1 value2.2 ... static field plugin --- Key: NUTCH-940 URL: https://issues.apache.org/jira/browse/NUTCH-940 Project: Nutch Issue Type: New Feature Components: indexer Affects Versions: 1.3, 2.0 Reporter: Claudio Martella Priority: Minor Attachments: index-static.diff, static-field.diff, static-field.tar.gz A simple plugin called at indexing that adds fields with static data. You can specify a list of fieldname:fieldcontent per nutch job. It can be useful when collections can't be created by urlpatterns, like in subcollection, but on a job-basis. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-940) static field plugin
[ https://issues.apache.org/jira/browse/NUTCH-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981412#action_12981412 ] Julien Nioche commented on NUTCH-940: - MapString,String[] = multiple values for the same key which is useful for mulitvalued fields in SOLR e.g. anchors. see the functionalities of https://issues.apache.org/jira/browse/NUTCH-924 (and my comments there). Since your plugin is more generic I'd rather use it as soon as it provides at least the same functionalities . Don't forget to add to src/plugin/build.xml ant dir=index-static target=clean/ Thanks Julien static field plugin --- Key: NUTCH-940 URL: https://issues.apache.org/jira/browse/NUTCH-940 Project: Nutch Issue Type: New Feature Components: indexer Affects Versions: 1.3, 2.0 Reporter: Claudio Martella Priority: Minor Attachments: index-static.diff, static-field.diff, static-field.tar.gz A simple plugin called at indexing that adds fields with static data. You can specify a list of fieldname:fieldcontent per nutch job. It can be useful when collections can't be created by urlpatterns, like in subcollection, but on a job-basis. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (NUTCH-956) soldindex issues
soldindex issues Key: NUTCH-956 URL: https://issues.apache.org/jira/browse/NUTCH-956 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 2.0 Reporter: Alexis I ran into a few caveats with solrindex command trying to index documents. Please refer to http://techvineyard.blogspot.com/2010/12/build-nutch-20.html#solrindex that describes my tests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-956) soldindex issues
[ https://issues.apache.org/jira/browse/NUTCH-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-956: - Attachment: solr.patch Here are the changes: - Avoid multiple values for id field. (NUTCH-819) - Allow multiple values for tag field. Add tld (Top Level Domain) field. - Get the content-type from WebPage object's member. Otherwise, you will see NullPointerExceptions. - Compare strings with equalsTo. That's pretty random, but it avoids having some suprises. soldindex issues Key: NUTCH-956 URL: https://issues.apache.org/jira/browse/NUTCH-956 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 2.0 Reporter: Alexis Attachments: solr.patch I ran into a few caveats with solrindex command trying to index documents. Please refer to http://techvineyard.blogspot.com/2010/12/build-nutch-20.html#solrindex that describes my tests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Nutch-trunk #1367
See https://hudson.apache.org/hudson/job/Nutch-trunk/1367/ -- [...truncated 1008 lines...] A src/plugin/subcollection/src/java/org/apache/nutch/collection A src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java A src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java A src/plugin/subcollection/src/java/org/apache/nutch/collection/package.html A src/plugin/subcollection/src/java/org/apache/nutch/indexer A src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection A src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/SubcollectionIndexingFilter.java A src/plugin/subcollection/README.txt A src/plugin/subcollection/plugin.xml A src/plugin/subcollection/build.xml A src/plugin/index-more A src/plugin/index-more/ivy.xml A src/plugin/index-more/src A src/plugin/index-more/src/test A src/plugin/index-more/src/test/org A src/plugin/index-more/src/test/org/apache A src/plugin/index-more/src/test/org/apache/nutch A src/plugin/index-more/src/test/org/apache/nutch/indexer A src/plugin/index-more/src/test/org/apache/nutch/indexer/more A src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java A src/plugin/index-more/src/java A src/plugin/index-more/src/java/org A src/plugin/index-more/src/java/org/apache A src/plugin/index-more/src/java/org/apache/nutch A src/plugin/index-more/src/java/org/apache/nutch/indexer A src/plugin/index-more/src/java/org/apache/nutch/indexer/more A src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java A src/plugin/index-more/src/java/org/apache/nutch/indexer/more/package.html A src/plugin/index-more/plugin.xml A src/plugin/index-more/build.xml AUsrc/plugin/plugin.dtd A src/plugin/parse-ext A src/plugin/parse-ext/ivy.xml A src/plugin/parse-ext/src A src/plugin/parse-ext/src/test A src/plugin/parse-ext/src/test/org A src/plugin/parse-ext/src/test/org/apache A src/plugin/parse-ext/src/test/org/apache/nutch A src/plugin/parse-ext/src/test/org/apache/nutch/parse A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java A src/plugin/parse-ext/src/java A src/plugin/parse-ext/src/java/org A src/plugin/parse-ext/src/java/org/apache A src/plugin/parse-ext/src/java/org/apache/nutch A src/plugin/parse-ext/src/java/org/apache/nutch/parse A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java A src/plugin/parse-ext/plugin.xml A src/plugin/parse-ext/build.xml A src/plugin/parse-ext/command A src/plugin/urlnormalizer-pass A src/plugin/urlnormalizer-pass/ivy.xml A src/plugin/urlnormalizer-pass/src A src/plugin/urlnormalizer-pass/src/test A src/plugin/urlnormalizer-pass/src/test/org A src/plugin/urlnormalizer-pass/src/test/org/apache A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass AU src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass/TestPassURLNormalizer.java A src/plugin/urlnormalizer-pass/src/java A src/plugin/urlnormalizer-pass/src/java/org A src/plugin/urlnormalizer-pass/src/java/org/apache A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass AU src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass/PassURLNormalizer.java AUsrc/plugin/urlnormalizer-pass/plugin.xml AUsrc/plugin/urlnormalizer-pass/build.xml A src/plugin/parse-html A src/plugin/parse-html/ivy.xml A src/plugin/parse-html/lib A src/plugin/parse-html/lib/tagsoup.LICENSE.txt A src/plugin/parse-html/src A src/plugin/parse-html/src/test A src/plugin/parse-html/src/test/org A src/plugin/parse-html/src/test/org/apache A src/plugin/parse-html/src/test/org/apache/nutch A src/plugin/parse-html/src/test/org/apache/nutch/parse A