Java.io.IOException with multiple copyField/ directives

2010-12-03 Thread Peter Litsegård
Hi!

I've run into a strange behaviour while using Nutch (solrindexer) together with 
Solr 1.4.1. I'd like to copy the 'title' and 'content' field to another field, 
say, 'foo'. In my first attempt I added the copyField/ directives in 
schema.xml and got the java exception so I removed them from schema.xml. In my 
second attempt I added the copyField/ directives to the 
'solrindex-mapping.xml' file and ran into the same exception again! Is this a 
known issue or have I stumbled into unknown territory?

Any workarounds?

Many thanks!
/Peter

Re: Java.io.IOException with multiple copyField/ directives

2010-12-03 Thread Andrzej Bialecki
On 2010-12-03 09:52, Peter Litsegård wrote:
 Hi!
 
 I've run into a strange behaviour while using Nutch (solrindexer) together 
 with Solr 1.4.1. I'd like to copy the 'title' and 'content' field to another 
 field, say, 'foo'. In my first attempt I added the copyField/ directives in 
 schema.xml and got the java exception so I removed them from schema.xml. In 
 my second attempt I added the copyField/ directives to the 
 'solrindex-mapping.xml' file and ran into the same exception again! Is this a 
 known issue or have I stumbled into unknown territory?
 
 Any workarounds?

I suspect that the field type declared in your schema.xml is not
multiValued. What was the exception?


-- 
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



SV: Java.io.IOException with multiple copyField/ directives

2010-12-03 Thread Peter Litsegård
Hi Andrzej!

The exception was java.io.IOException. Of course I forgot to make the 
dest field multivalued. Embarrassing:-) I'll update the schema.xml file an 
try again... Stay tuned!

Cheers,
/Peter 

-Ursprungligt meddelande-
Från: Andrzej Bialecki [mailto:a...@getopt.org] 
Skickat: den 3 december 2010 10:42
Till: dev@nutch.apache.org
Ämne: Re: Java.io.IOException with multiple copyField/ directives

On 2010-12-03 09:52, Peter Litsegård wrote:
 Hi!
 
 I've run into a strange behaviour while using Nutch (solrindexer) together 
 with Solr 1.4.1. I'd like to copy the 'title' and 'content' field to another 
 field, say, 'foo'. In my first attempt I added the copyField/ directives in 
 schema.xml and got the java exception so I removed them from schema.xml. In 
 my second attempt I added the copyField/ directives to the 
 'solrindex-mapping.xml' file and ran into the same exception again! Is this a 
 known issue or have I stumbled into unknown territory?
 
 Any workarounds?

I suspect that the field type declared in your schema.xml is not multiValued. 
What was the exception?


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web ___|||__||  \|  ||  
|  Embedded Unix, System Integration http://www.sigram.com  Contact: info at 
sigram dot com



SV: Java.io.IOException with multiple copyField/ directives

2010-12-03 Thread Peter Litsegård
Hi Andrzej!

OF COURSE I'd forgot to set 'multiValued=true! Thanks for pointing this out!

Cheers,
/Peter 

-Ursprungligt meddelande-
Från: Andrzej Bialecki [mailto:a...@getopt.org] 
Skickat: den 3 december 2010 10:42
Till: dev@nutch.apache.org
Ämne: Re: Java.io.IOException with multiple copyField/ directives

On 2010-12-03 09:52, Peter Litsegård wrote:
 Hi!
 
 I've run into a strange behaviour while using Nutch (solrindexer) together 
 with Solr 1.4.1. I'd like to copy the 'title' and 'content' field to another 
 field, say, 'foo'. In my first attempt I added the copyField/ directives in 
 schema.xml and got the java exception so I removed them from schema.xml. In 
 my second attempt I added the copyField/ directives to the 
 'solrindex-mapping.xml' file and ran into the same exception again! Is this a 
 known issue or have I stumbled into unknown territory?
 
 Any workarounds?

I suspect that the field type declared in your schema.xml is not multiValued. 
What was the exception?


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web ___|||__||  \|  ||  
|  Embedded Unix, System Integration http://www.sigram.com  Contact: info at 
sigram dot com



[jira] Created: (NUTCH-944) Increase the number of elements to look for URLs and add the ability to specify multiple attributes by elements

2010-12-03 Thread Jean-Francois Gingras (JIRA)
Increase the number of elements to look for URLs and add the ability to specify 
multiple attributes by elements
---

 Key: NUTCH-944
 URL: https://issues.apache.org/jira/browse/NUTCH-944
 Project: Nutch
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.3
 Environment: GNU/Linux Fedora 12
Reporter: Jean-Francois Gingras
Priority: Minor
 Fix For: 1.3


Here a patch for DOMContentUtils.java that increase the number of elements to 
look for URLs. It also add the ability to specify multiple attributes by 
elements, for example:

linkParams.put(frame, new LinkParams(frame, longdesc,src, 0));
linkParams.put(object, new LinkParams(object, 
classid,codebase,data,usemap, 0));
linkParams.put(video, new LinkParams(video, poster,src, 0)); // HTML 5

I have a patch for release-1.0 and branch-1.3

I would love to hear your comments about this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-944) Increase the number of elements to look for URLs and add the ability to specify multiple attributes by elements

2010-12-03 Thread Jean-Francois Gingras (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Francois Gingras updated NUTCH-944:


Attachment: DOMContentUtils.java.path-1.3
DOMContentUtils.java.path-1.0

I upload the patch for 1.0 because we currently use it.

 Increase the number of elements to look for URLs and add the ability to 
 specify multiple attributes by elements
 ---

 Key: NUTCH-944
 URL: https://issues.apache.org/jira/browse/NUTCH-944
 Project: Nutch
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.3
 Environment: GNU/Linux Fedora 12
Reporter: Jean-Francois Gingras
Priority: Minor
 Fix For: 1.3

 Attachments: DOMContentUtils.java.path-1.0, 
 DOMContentUtils.java.path-1.3


 Here a patch for DOMContentUtils.java that increase the number of elements to 
 look for URLs. It also add the ability to specify multiple attributes by 
 elements, for example:
 linkParams.put(frame, new LinkParams(frame, longdesc,src, 0));
 linkParams.put(object, new LinkParams(object, 
 classid,codebase,data,usemap, 0));
 linkParams.put(video, new LinkParams(video, poster,src, 0)); // HTML 5
 I have a patch for release-1.0 and branch-1.3
 I would love to hear your comments about this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-945) Indexing to multiple SOLR Servers

2010-12-03 Thread Charan Malemarpuram (JIRA)
Indexing to multiple SOLR Servers
-

 Key: NUTCH-945
 URL: https://issues.apache.org/jira/browse/NUTCH-945
 Project: Nutch
  Issue Type: Improvement
  Components: indexer
Affects Versions: 1.2
Reporter: Charan Malemarpuram


It would be nice to have a default Indexer in Nutch, which can submit docs to 
multiple SOLR Servers.

 Partitioning is always the question, when writing to multiple SOLR Servers.
 Default partitioning can be a simple hashcode based distribution with 
 addition hooks to customization.

 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Nutch-trunk #1326

2010-12-03 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Nutch-trunk/1326/changes

Changes:

[ab] Fix breakage due to the changed Gora API.

--
[...truncated 1006 lines...]
A src/plugin/subcollection/src/java/org/apache/nutch
A src/plugin/subcollection/src/java/org/apache/nutch/collection
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/package.html
A src/plugin/subcollection/src/java/org/apache/nutch/indexer
A 
src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection
A 
src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/SubcollectionIndexingFilter.java
A src/plugin/subcollection/README.txt
A src/plugin/subcollection/plugin.xml
A src/plugin/subcollection/build.xml
A src/plugin/index-more
A src/plugin/index-more/ivy.xml
A src/plugin/index-more/src
A src/plugin/index-more/src/test
A src/plugin/index-more/src/test/org
A src/plugin/index-more/src/test/org/apache
A src/plugin/index-more/src/test/org/apache/nutch
A src/plugin/index-more/src/test/org/apache/nutch/indexer
A src/plugin/index-more/src/test/org/apache/nutch/indexer/more
A 
src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java
A src/plugin/index-more/src/java
A src/plugin/index-more/src/java/org
A src/plugin/index-more/src/java/org/apache
A src/plugin/index-more/src/java/org/apache/nutch
A src/plugin/index-more/src/java/org/apache/nutch/indexer
A src/plugin/index-more/src/java/org/apache/nutch/indexer/more
A 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
A 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/package.html
A src/plugin/index-more/plugin.xml
A src/plugin/index-more/build.xml
AUsrc/plugin/plugin.dtd
A src/plugin/parse-ext
A src/plugin/parse-ext/ivy.xml
A src/plugin/parse-ext/src
A src/plugin/parse-ext/src/test
A src/plugin/parse-ext/src/test/org
A src/plugin/parse-ext/src/test/org/apache
A src/plugin/parse-ext/src/test/org/apache/nutch
A src/plugin/parse-ext/src/test/org/apache/nutch/parse
A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext
A 
src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java
A src/plugin/parse-ext/src/java
A src/plugin/parse-ext/src/java/org
A src/plugin/parse-ext/src/java/org/apache
A src/plugin/parse-ext/src/java/org/apache/nutch
A src/plugin/parse-ext/src/java/org/apache/nutch/parse
A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext
A 
src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java
A src/plugin/parse-ext/plugin.xml
A src/plugin/parse-ext/build.xml
A src/plugin/parse-ext/command
A src/plugin/urlnormalizer-pass
A src/plugin/urlnormalizer-pass/ivy.xml
A src/plugin/urlnormalizer-pass/src
A src/plugin/urlnormalizer-pass/src/test
A src/plugin/urlnormalizer-pass/src/test/org
A src/plugin/urlnormalizer-pass/src/test/org/apache
A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch
A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net
A 
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer
A 
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass
AU
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass/TestPassURLNormalizer.java
A src/plugin/urlnormalizer-pass/src/java
A src/plugin/urlnormalizer-pass/src/java/org
A src/plugin/urlnormalizer-pass/src/java/org/apache
A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch
A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net
A 
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer
A 
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass
AU
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass/PassURLNormalizer.java
AUsrc/plugin/urlnormalizer-pass/plugin.xml
AUsrc/plugin/urlnormalizer-pass/build.xml
A src/plugin/parse-html
A src/plugin/parse-html/ivy.xml
A src/plugin/parse-html/lib
A src/plugin/parse-html/lib/tagsoup.LICENSE.txt
A src/plugin/parse-html/src
A src/plugin/parse-html/src/test
A src/plugin/parse-html/src/test/org
A src/plugin/parse-html/src/test/org/apache
A