[jira] Commented: (NUTCH-940) static field plugin

2011-01-13 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981274#action_12981274
 ] 

Julien Nioche commented on NUTCH-940:
-

Claudio, 

It would be better to follow the implicit convention for naming the plugins and 
call it index-static for instance. This will be a better indication of what the 
plugin does.

Would be better to be able to specify multiple values for a field as well i.e 
have a MapString,String[]

Julien

 static field plugin
 ---

 Key: NUTCH-940
 URL: https://issues.apache.org/jira/browse/NUTCH-940
 Project: Nutch
  Issue Type: New Feature
  Components: indexer
Affects Versions: 1.3, 2.0
Reporter: Claudio Martella
Priority: Minor
 Attachments: static-field.diff, static-field.tar.gz


 A simple plugin called at indexing that adds fields with static data. You can 
 specify a list of fieldname:fieldcontent per nutch job.
 It can be useful when collections can't be created by urlpatterns, like in 
 subcollection, but on a job-basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-940) static field plugin

2011-01-13 Thread Claudio Martella (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claudio Martella updated NUTCH-940:
---

Attachment: index-static.diff

changed naming conventions from static-field to index-static

 static field plugin
 ---

 Key: NUTCH-940
 URL: https://issues.apache.org/jira/browse/NUTCH-940
 Project: Nutch
  Issue Type: New Feature
  Components: indexer
Affects Versions: 1.3, 2.0
Reporter: Claudio Martella
Priority: Minor
 Attachments: index-static.diff, static-field.diff, static-field.tar.gz


 A simple plugin called at indexing that adds fields with static data. You can 
 specify a list of fieldname:fieldcontent per nutch job.
 It can be useful when collections can't be created by urlpatterns, like in 
 subcollection, but on a job-basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-940) static field plugin

2011-01-13 Thread Claudio Martella (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981280#action_12981280
 ] 

Claudio Martella commented on NUTCH-940:


About the multiple values, i split on commas and on colons, so values can 
already have multiple tokens with spaces. They will not be divided in the map, 
but does it make a difference at indexing time?

i.e. this is reasonable:

field1:value1.1 value1.2 value1.3,field2:value2.1 value2.2 ...

 static field plugin
 ---

 Key: NUTCH-940
 URL: https://issues.apache.org/jira/browse/NUTCH-940
 Project: Nutch
  Issue Type: New Feature
  Components: indexer
Affects Versions: 1.3, 2.0
Reporter: Claudio Martella
Priority: Minor
 Attachments: index-static.diff, static-field.diff, static-field.tar.gz


 A simple plugin called at indexing that adds fields with static data. You can 
 specify a list of fieldname:fieldcontent per nutch job.
 It can be useful when collections can't be created by urlpatterns, like in 
 subcollection, but on a job-basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-940) static field plugin

2011-01-13 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981412#action_12981412
 ] 

Julien Nioche commented on NUTCH-940:
-

MapString,String[] = multiple values for the same key which is useful for 
mulitvalued fields in SOLR e.g. anchors.

see the functionalities of https://issues.apache.org/jira/browse/NUTCH-924 (and 
my comments there). Since your plugin is more generic I'd rather use it as soon 
as it provides at least the same functionalities .

Don't forget to add to src/plugin/build.xml

 ant dir=index-static target=clean/


Thanks

Julien

 static field plugin
 ---

 Key: NUTCH-940
 URL: https://issues.apache.org/jira/browse/NUTCH-940
 Project: Nutch
  Issue Type: New Feature
  Components: indexer
Affects Versions: 1.3, 2.0
Reporter: Claudio Martella
Priority: Minor
 Attachments: index-static.diff, static-field.diff, static-field.tar.gz


 A simple plugin called at indexing that adds fields with static data. You can 
 specify a list of fieldname:fieldcontent per nutch job.
 It can be useful when collections can't be created by urlpatterns, like in 
 subcollection, but on a job-basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-956) soldindex issues

2011-01-13 Thread Alexis (JIRA)
soldindex issues


 Key: NUTCH-956
 URL: https://issues.apache.org/jira/browse/NUTCH-956
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 2.0
Reporter: Alexis


I ran into a few caveats with solrindex command trying to index documents.
Please refer to 
http://techvineyard.blogspot.com/2010/12/build-nutch-20.html#solrindex that 
describes my tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-956) soldindex issues

2011-01-13 Thread Alexis (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis updated NUTCH-956:
-

Attachment: solr.patch

Here are the changes:

- Avoid multiple values for id field. (NUTCH-819)
- Allow multiple values for tag field. Add tld (Top Level Domain) field.
- Get the content-type from WebPage object's member. Otherwise, you will see 
NullPointerExceptions.
- Compare strings with equalsTo. That's pretty random, but it avoids having 
some suprises.

 soldindex issues
 

 Key: NUTCH-956
 URL: https://issues.apache.org/jira/browse/NUTCH-956
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 2.0
Reporter: Alexis
 Attachments: solr.patch


 I ran into a few caveats with solrindex command trying to index documents.
 Please refer to 
 http://techvineyard.blogspot.com/2010/12/build-nutch-20.html#solrindex that 
 describes my tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Nutch-trunk #1367

2011-01-13 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Nutch-trunk/1367/

--
[...truncated 1008 lines...]
A src/plugin/subcollection/src/java/org/apache/nutch/collection
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/package.html
A src/plugin/subcollection/src/java/org/apache/nutch/indexer
A 
src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection
A 
src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/SubcollectionIndexingFilter.java
A src/plugin/subcollection/README.txt
A src/plugin/subcollection/plugin.xml
A src/plugin/subcollection/build.xml
A src/plugin/index-more
A src/plugin/index-more/ivy.xml
A src/plugin/index-more/src
A src/plugin/index-more/src/test
A src/plugin/index-more/src/test/org
A src/plugin/index-more/src/test/org/apache
A src/plugin/index-more/src/test/org/apache/nutch
A src/plugin/index-more/src/test/org/apache/nutch/indexer
A src/plugin/index-more/src/test/org/apache/nutch/indexer/more
A 
src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java
A src/plugin/index-more/src/java
A src/plugin/index-more/src/java/org
A src/plugin/index-more/src/java/org/apache
A src/plugin/index-more/src/java/org/apache/nutch
A src/plugin/index-more/src/java/org/apache/nutch/indexer
A src/plugin/index-more/src/java/org/apache/nutch/indexer/more
A 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
A 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/package.html
A src/plugin/index-more/plugin.xml
A src/plugin/index-more/build.xml
AUsrc/plugin/plugin.dtd
A src/plugin/parse-ext
A src/plugin/parse-ext/ivy.xml
A src/plugin/parse-ext/src
A src/plugin/parse-ext/src/test
A src/plugin/parse-ext/src/test/org
A src/plugin/parse-ext/src/test/org/apache
A src/plugin/parse-ext/src/test/org/apache/nutch
A src/plugin/parse-ext/src/test/org/apache/nutch/parse
A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext
A 
src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java
A src/plugin/parse-ext/src/java
A src/plugin/parse-ext/src/java/org
A src/plugin/parse-ext/src/java/org/apache
A src/plugin/parse-ext/src/java/org/apache/nutch
A src/plugin/parse-ext/src/java/org/apache/nutch/parse
A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext
A 
src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java
A src/plugin/parse-ext/plugin.xml
A src/plugin/parse-ext/build.xml
A src/plugin/parse-ext/command
A src/plugin/urlnormalizer-pass
A src/plugin/urlnormalizer-pass/ivy.xml
A src/plugin/urlnormalizer-pass/src
A src/plugin/urlnormalizer-pass/src/test
A src/plugin/urlnormalizer-pass/src/test/org
A src/plugin/urlnormalizer-pass/src/test/org/apache
A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch
A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net
A 
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer
A 
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass
AU
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass/TestPassURLNormalizer.java
A src/plugin/urlnormalizer-pass/src/java
A src/plugin/urlnormalizer-pass/src/java/org
A src/plugin/urlnormalizer-pass/src/java/org/apache
A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch
A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net
A 
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer
A 
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass
AU
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass/PassURLNormalizer.java
AUsrc/plugin/urlnormalizer-pass/plugin.xml
AUsrc/plugin/urlnormalizer-pass/build.xml
A src/plugin/parse-html
A src/plugin/parse-html/ivy.xml
A src/plugin/parse-html/lib
A src/plugin/parse-html/lib/tagsoup.LICENSE.txt
A src/plugin/parse-html/src
A src/plugin/parse-html/src/test
A src/plugin/parse-html/src/test/org
A src/plugin/parse-html/src/test/org/apache
A src/plugin/parse-html/src/test/org/apache/nutch
A src/plugin/parse-html/src/test/org/apache/nutch/parse
A