date:20120215

[jira] [Commented] (NUTCH-1129) Any23 Nutch plugin

2012-02-15 Thread Markus Jelsma (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208309#comment-13208309
 ] 

Markus Jelsma commented on NUTCH-1129:
--

This is a parser plugin right? How will this work if we for example would like 
to parse microdata with any23 and use Tika's BoilerpipeContentHandler to 
extraction? In the current BP patch we use multiple content handlers to parse 
all in one go so i wonder if this could be implemented as such.

Please correct me when wrong :)

 Any23 Nutch plugin
 --

 Key: NUTCH-1129
 URL: https://issues.apache.org/jira/browse/NUTCH-1129
 Project: Nutch
  Issue Type: New Feature
  Components: parser
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
Priority: Minor
 Fix For: 1.5

 Attachments: NUTCH-1129.patch


 This plugin should build on the Any23 library to provide us with a plugin 
 which extracts RDF data from HTTP and file resources. Although as of writing 
 Any23 not part of the ASF, the project is working towards integration into 
 the Apache Incubator. Once the project proves its value, this would be an 
 excellent addition to the Nutch 1.X codebase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1279) Check if limit has been reached in GeneraterReducer must be the first check performance-wise.

2012-02-15 Thread Ferdy Galema (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdy Galema updated NUTCH-1279:


Attachment: NUTCH-1279.txt

Attached patch and committed.

 Check if limit has been reached in GeneraterReducer must be the first check 
 performance-wise.
 -

 Key: NUTCH-1279
 URL: https://issues.apache.org/jira/browse/NUTCH-1279
 Project: Nutch
  Issue Type: Improvement
  Components: generator
Reporter: Ferdy Galema
Priority: Minor
 Fix For: nutchgora

 Attachments: NUTCH-1279.txt


 The (count = limit) should be put up front in the reduce method of the 
 generator, because that way when the limit is reached the reduce method will 
 return faster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1279) Check if limit has been reached in GeneraterReducer must be the first check performance-wise.

2012-02-15 Thread Lewis John McGibbney (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208341#comment-13208341
 ] 

Lewis John McGibbney commented on NUTCH-1279:
-

Hi Ferdy, have you checked whether this is the case in trunk as well? I know 
the fetcher architecture is slightly different.  

 Check if limit has been reached in GeneraterReducer must be the first check 
 performance-wise.
 -

 Key: NUTCH-1279
 URL: https://issues.apache.org/jira/browse/NUTCH-1279
 Project: Nutch
  Issue Type: Improvement
  Components: generator
Reporter: Ferdy Galema
Priority: Minor
 Fix For: nutchgora

 Attachments: NUTCH-1279.txt


 The (count = limit) should be put up front in the reduce method of the 
 generator, because that way when the limit is reached the reduce method will 
 return faster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Detecting Encoding with plugins

2012-02-15 Thread Lewis John Mcgibbney

Hi Markus,

I've been vaguely keeping up with yourself and Julien's work on this.

I would really like to get a test case for this though! I'll try working
towards this as a sub-target of another issue. For reference, there is a
Tika mimeType test case here [1] and Tika document encoding test here [2].
Which we may or may not be interested in porting over to o.a.n?

wdyt?

Thanks

Lewis

[1]
https://svn.apache.org/viewvc/incubator/any23/trunk/core/src/test/java/org/apache/any23/mime/TikaMIMETypeDetectorTest.java?view=markup
[2]
https://svn.apache.org/viewvc/incubator/any23/trunk/core/src/test/java/org/apache/any23/encoding/TikaEncodingDetectorTest.java?view=markup

On Tue, Feb 14, 2012 at 11:51 PM, Markus Jelsma mar...@apache.org wrote:

 Hi,

 This was indeed an issue until today. The detected type is in the crawl
 datum
 metadata.

 https://issues.apache.org/jira/browse/NUTCH-1259

  Hi,
 
  I can't see anywhere within our parser plugins where we detect encoding
 of
  documents. I've also begun looking through the o.a.n.p package but again
 I
  can't see anything.
 
  Can anyone provide some detail on this please?
 
  Thank you
 
  Lewis




-- 
*Lewis*

[jira] [Commented] (NUTCH-1279) Check if limit has been reached in GeneraterReducer must be the first check performance-wise.

2012-02-15 Thread Ferdy Galema (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208354#comment-13208354
 ] 

Ferdy Galema commented on NUTCH-1279:
-

Hi Lewis, I can confirm that trunk already uses the same optimization.

 Check if limit has been reached in GeneraterReducer must be the first check 
 performance-wise.
 -

 Key: NUTCH-1279
 URL: https://issues.apache.org/jira/browse/NUTCH-1279
 Project: Nutch
  Issue Type: Improvement
  Components: generator
Reporter: Ferdy Galema
Priority: Minor
 Fix For: nutchgora

 Attachments: NUTCH-1279.txt


 The (count = limit) should be put up front in the reduce method of the 
 generator, because that way when the limit is reached the reduce method will 
 return faster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Detecting Encoding with plugins

2012-02-15 Thread Julien Nioche

The mimetype is not the same thing as the encoding. As Ken pointed out this
is done at the individual parser level

On 14 February 2012 23:51, Markus Jelsma mar...@apache.org wrote:

 Hi,

 This was indeed an issue until today. The detected type is in the crawl
 datum
 metadata.

 https://issues.apache.org/jira/browse/NUTCH-1259

  Hi,
 
  I can't see anywhere within our parser plugins where we detect encoding
 of
  documents. I've also begun looking through the o.a.n.p package but again
 I
  can't see anything.
 
  Can anyone provide some detail on this please?
 
  Thank you
 
  Lewis




-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Detecting Encoding with plugins

2012-02-15 Thread Lewis John Mcgibbney

Yes this is correct, but we still don't test for either of the two.

On Wed, Feb 15, 2012 at 10:59 AM, Julien Nioche 
lists.digitalpeb...@gmail.com wrote:

 The mimetype is not the same thing as the encoding. As Ken pointed out
 this is done at the individual parser level


 On 14 February 2012 23:51, Markus Jelsma mar...@apache.org wrote:

 Hi,

 This was indeed an issue until today. The detected type is in the crawl
 datum
 metadata.

 https://issues.apache.org/jira/browse/NUTCH-1259

  Hi,
 
  I can't see anywhere within our parser plugins where we detect encoding
 of
  documents. I've also begun looking through the o.a.n.p package but
 again I
  can't see anything.
 
  Can anyone provide some detail on this please?
 
  Thank you
 
  Lewis




 --
 *
 *Open Source Solutions for Text Engineering

 http://digitalpebble.blogspot.com/
 http://www.digitalpebble.com
 http://twitter.com/digitalpebble




-- 
*Lewis*

[jira] [Commented] (NUTCH-1278) Fetch Improvement in threads per host

2012-02-15 Thread Lewis John McGibbney (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208413#comment-13208413
 ] 

Lewis John McGibbney commented on NUTCH-1278:
-

Hi Behnam. Do you have a patch for trunk? Thank you

 Fetch Improvement in threads per host
 -

 Key: NUTCH-1278
 URL: https://issues.apache.org/jira/browse/NUTCH-1278
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Affects Versions: 1.4
Reporter: behnam nikbakht

 the value of maxThreads is equal to fetcher.threads.per.host and is constant 
 for every host
 there is a possibility with using of dynamic values for every host that 
 influeced with number of blocked requests.
 this means that if number of blocked requests for one host increased, then we 
 most decrease this value and increase http.timeout

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Detecting Encoding with plugins

2012-02-15 Thread Julien Nioche

I assume Tika does already - why should we duplicate the tests in Nutch? we
delegate the functionality to Tika, IMHO this means delegating the testing
as well. What we could do to contribute tests to Tika instead if it does
not have any.

Re-any23 : why not handling it as a Tika parser instead of a Nutch one?
This could be useful to other Tika users who do not necessarily use Nutch

Julien

On 15 February 2012 12:17, Lewis John Mcgibbney
lewis.mcgibb...@gmail.comwrote:

 Yes this is correct, but we still don't test for either of the two.


 On Wed, Feb 15, 2012 at 10:59 AM, Julien Nioche 
 lists.digitalpeb...@gmail.com wrote:

 The mimetype is not the same thing as the encoding. As Ken pointed out
 this is done at the individual parser level


 On 14 February 2012 23:51, Markus Jelsma mar...@apache.org wrote:

 Hi,

 This was indeed an issue until today. The detected type is in the crawl
 datum
 metadata.

 https://issues.apache.org/jira/browse/NUTCH-1259

  Hi,
 
  I can't see anywhere within our parser plugins where we detect
 encoding of
  documents. I've also begun looking through the o.a.n.p package but
 again I
  can't see anything.
 
  Can anyone provide some detail on this please?
 
  Thank you
 
  Lewis




 --
 *
 *Open Source Solutions for Text Engineering

 http://digitalpebble.blogspot.com/
 http://www.digitalpebble.com
 http://twitter.com/digitalpebble




 --
 *Lewis*




-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Detecting Encoding with plugins

2012-02-15 Thread Lewis John Mcgibbney

Hi Julien,

On Wed, Feb 15, 2012 at 12:27 PM, Julien Nioche 
lists.digitalpeb...@gmail.com wrote:

 I assume Tika does already - why should we duplicate the tests in Nutch?

We don't want to I suppose. However the point I was trying to make was that
as NUTCH-1259 detects the encoding type, however we don't have an automated
test to cover this, I assume the case is somewhat important or else the
ticket for NUTCH-1259 wouldn't have been opened originally? I agree with
you that general cases should be dealt with further upstream within Tika
development itself, however as the encoding detection is done in Nutch
within the cd metadata we may wish to get some test case to check... it's
not a huge thing I suppose.


 we delegate the functionality to Tika, IMHO this means delegating the
 testing as well. What we could do to contribute tests to Tika instead if it
 does not have any.

 Yeah this is correct. I'm expecting you guys will know better than me but
I would assume that Tika is mimetype and encoding detection compliant ;0)


 Re-any23 : why not handling it as a Tika parser instead of a Nutch one?
 This could be useful to other Tika users who do not necessarily use Nutch

OK so I suppose this is completely open for discussion and I really welcome
it as well. On one hand I see working with Any23 as a parse-any23 plugin
within Nutch as the first step in the road to answering this question.
Regardless of whether Any23 graduates and is integrated into Tika itself or
as a TLP you are completely right that it should be made as openly
available to as many people. Personally I agree with you Julien.

One last thing, I know this if off topic... but with regards to our
microformats-reltag plugin... I think the RelTagParser could and should be
move over to Any23. Any23 already supports extraction of an number of
microformats. wdyt?

Thanks

Re: Detecting Encoding with plugins

2012-02-15 Thread Julien Nioche

Hi Lewis

I assume Tika does already - why should we duplicate the tests in Nutch?

 We don't want to I suppose. However the point I was trying to make was
 that as NUTCH-1259 detects the encoding type,


however we don't have an automated test to cover this, I assume the case is
 somewhat important or else the ticket for NUTCH-1259 wouldn't have been
 opened originally?


nope. NUTCH-1259 is about storing the mime-type value detected by Tika. It
is not the same as the encoding. This specific JIRA is not whether or not
we get the correct value but a purely functional one about where we store
it. There is not much to test wrt it



 I agree with you that general cases should be dealt with further upstream
 within Tika development itself, however as the encoding detection is done
 in Nutch within the cd metadata we may wish to get some test case to
 check... it's not a huge thing I suppose.


we do have tests for the EncodingDetector (TestEncodingDetector), which is
used by parse-html already. It is Ok to have that as it is our own parser.
As explained earlier, for the Tika parser the detection is delegated to the
Tika parser implementations and as such should be tested there.



 we delegate the functionality to Tika, IMHO this means delegating the
 testing as well. What we could do to contribute tests to Tika instead if it
 does not have any.

 Yeah this is correct. I'm expecting you guys will know better than me but
 I would assume that Tika is mimetype and encoding detection compliant ;0)


I definitely do not pretend to know more than anyone else BTW :-) I don't
understand what you mean by 'compliant'. Perfect? Probably not. There was
an interesting experiment made by Ken on measuring the accuracy of the
charset detection in the Tika book - which anyone remotely interested in
Nutch should get BTW. There has been an interesting blog entry recently on
comparing the language detection in Tika and other libraries (cant find ref
and am in a hurry - sorry)




 Re-any23 : why not handling it as a Tika parser instead of a Nutch one?
 This could be useful to other Tika users who do not necessarily use Nutch

 OK so I suppose this is completely open for discussion and I really
 welcome it as well. On one hand I see working with Any23 as a parse-any23
 plugin within Nutch as the first step in the road to answering this
 question. Regardless of whether Any23 graduates and is integrated into Tika
 itself or as a TLP you are completely right that it should be made as
 openly available to as many people. Personally I agree with you Julien.

 One last thing, I know this if off topic... but with regards to our
 microformats-reltag plugin... I think the RelTagParser could and should be
 move over to Any23. Any23 already supports extraction of an number of
 microformats. wdyt?


it would probably make sense as an initial step if you don't want to
venture in trying to wrap it as a Tika parser :-)

Julien



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

[jira] [Updated] (NUTCH-1215) UpdateDB should not require segment as input

2012-02-15 Thread Markus Jelsma (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1215:
-

Attachment: NUTCH-1215-1.5-1.patch

Patch for 1.5. Couldn't be simpler.

 UpdateDB should not require segment as input
 

 Key: NUTCH-1215
 URL: https://issues.apache.org/jira/browse/NUTCH-1215
 Project: Nutch
  Issue Type: Bug
  Components: linkdb
Affects Versions: 1.4
Reporter: Markus Jelsma
 Fix For: 1.5

 Attachments: NUTCH-1215-1.5-1.patch


 UpdateDB requires an input segment. This causes the metrics for the records 
 of the segment to change, e.g. from fetched to not_modified and changes an 
 adaptive fetch schedule accordingly. This should not happen when one needs to 
 update for filtering of normalizing or other maintenance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1210) DomainBlacklistFilter

2012-02-15 Thread Lewis John McGibbney (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208459#comment-13208459
 ] 

Lewis John McGibbney commented on NUTCH-1210:
-

This looks really nice Markus. I like the documentation and test as well. I 
would like to try it out with another couple of test scenarios before passing 
my full opinion, which I will be able to do this afternoon. 

 DomainBlacklistFilter
 -

 Key: NUTCH-1210
 URL: https://issues.apache.org/jira/browse/NUTCH-1210
 Project: Nutch
  Issue Type: New Feature
Reporter: Markus Jelsma
Assignee: Markus Jelsma
 Fix For: 1.5

 Attachments: NUTCH-1210-1.5-1.patch


 The current DomainFilter acts as a white list. We also need a filter that 
 acts as a black list so we can allow tld's and/or domains with DomainFilter 
 but blacklist specific subdomains. If we would patch the current DomainFilter 
 for this behaviour it would break current semantics such as it's precedence. 
 Therefore i would propose a new filter instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1262) Map `duplicating` content-types to a single type

2012-02-15 Thread Markus Jelsma (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208461#comment-13208461
 ] 

Markus Jelsma commented on NUTCH-1262:
--

Is this issue still subject to debate? Opinions?

 Map `duplicating` content-types to a single type
 

 Key: NUTCH-1262
 URL: https://issues.apache.org/jira/browse/NUTCH-1262
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.5

 Attachments: NUTCH-1262-1.5-1.patch


 Similar or duplicating content-types can end-up differently in an index. 
 With, for example, both application/xhtml+xml and text/html it is impossible 
 to use a single filter to select `web pages`.
 See also: 
 http://lucene.472066.n3.nabble.com/application-xhtml-xml-gt-text-html-td3699942.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: XSD for Solr Schema

2012-02-15 Thread Markus Jelsma



On Tuesday 14 February 2012 19:23:25 Lewis John Mcgibbney wrote:
 Hi,
 
 Whilst we were chatting about XSD, XLST etc the other night, I started
 thinking about using xalan or saxon to validate Solr schema's against some
 XSD. This would mean that a specific error is thrown which highlights where
 in the schema the problem(s) lie, I think this would provide a better level
 of user functionality for situations where you wish to have custom schema
 implementations (as many of us do).
 
 Do we currently have measures in place to catch an invalid schema case?
 

Actually, yes. Solr usually throws a useful exception when something goes 
wrong.

 Thanks

-- 
Markus Jelsma - CTO - Openindex

[jira] [Commented] (NUTCH-1210) DomainBlacklistFilter

2012-02-15 Thread Lewis John McGibbney (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208881#comment-13208881
 ] 

Lewis John McGibbney commented on NUTCH-1210:
-

Hi Markus. 
1) I would ask one tiny change in ivy.xml
from
{code}
  configurations
include file=${nutch.root}/ivy/ivy-configurations.xml/
  /configurations
{code}
to
{code}
  configurations
include file=../../..//ivy/ivy-configurations.xml/
  /configurations
{code}
this is purely for consistency as I think it's easier to configure in Eclipse 
as the ${nutch.root} variable hasn't been specified.

2) Also domainblacklist-urlfilter.txt is not included in the patch under /conf. 
Would it be possible to have a file there with some commented out documentation 
so users at least have something to go on?

3) Your documentation in the main class also mentions that the property can be 
overridden in nutch-*.xml, however no property exists in nutch-default for 
people to go on meaning that it is likely people will become confused when 
trying to set the property from nutch-site.xml.

My tests seemt obe failing with trunk therefore there is something up with my 
trunk co, so I'll go get that sorted then test a bit more. Thanks  



 DomainBlacklistFilter
 -

 Key: NUTCH-1210
 URL: https://issues.apache.org/jira/browse/NUTCH-1210
 Project: Nutch
  Issue Type: New Feature
Reporter: Markus Jelsma
Assignee: Markus Jelsma
 Fix For: 1.5

 Attachments: NUTCH-1210-1.5-1.patch


 The current DomainFilter acts as a white list. We also need a filter that 
 acts as a black list so we can allow tld's and/or domains with DomainFilter 
 but blacklist specific subdomains. If we would patch the current DomainFilter 
 for this behaviour it would break current semantics such as it's precedence. 
 Therefore i would propose a new filter instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: how linkdb impact on scores

2012-02-15 Thread Markus Jelsma

hi

Please submit your question to the user mailing list. There's more public 
there and this list is for Nutch development only.

thanks

 there is a question about impact of linkdb on solrindex
 with test of solrindex, found that boost calculated from indexerScore, and
 pass inlinks to scoring plugins, but these plugins dont use of inlinks
 so, the results of solrindex with linkdb are equal to results without it.
 and with large linkdb this cause to slow solrindex.
 my question is where we can influence scoring with linkdb?
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-linkdb-impact-on-scores-tp3746051p3
 746051.html Sent from the Nutch - Dev mailing list archive at Nabble.com.

Build failed in Jenkins: Nutch-nutchgora #162

2012-02-15 Thread Apache Jenkins Server

See https://builds.apache.org/job/Nutch-nutchgora/162/

--
Started by timer
Building remotely on solaris1 in workspace 
https://builds.apache.org/job/Nutch-nutchgora/ws/
hudson.util.IOException2: remote file operation failed: 
https://builds.apache.org/job/Nutch-nutchgora/ws/ at 
hudson.remoting.Channel@50e71521:solaris1
at hudson.FilePath.act(FilePath.java:784)
at hudson.FilePath.act(FilePath.java:770)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:742)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:684)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1195)
at 
hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:576)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:465)
at hudson.model.Run.run(Run.java:1409)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:238)
Caused by: java.io.IOException: Remote call on solaris1 failed
at hudson.remoting.Channel.call(Channel.java:690)
at hudson.FilePath.act(FilePath.java:777)
... 10 more
Caused by: java.lang.ClassFormatError: Failed to load 
javax.servlet.ServletException
at 
hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:154)
at 
hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
at 
hudson.scm.SubversionWorkspaceSelector.syncWorkspaceFormatFromMaster(SubversionWorkspaceSelector.java:85)
at 
hudson.scm.SubversionSCM.createSvnClientManager(SubversionSCM.java:822)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:765)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:752)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2099)
at hudson.remoting.UserRequest.perform(UserRequest.java:118)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:287)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.lang.UnsupportedClassVersionError: Bad version number in .class 
file
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
at java.lang.ClassLoader.defineClass(ClassLoader.java:466)
at 
hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:152)
... 18 more
Publishing Javadoc
FATAL: Unable to copy Javadoc from 
https://builds.apache.org/job/Nutch-nutchgora/ws/nutchgora/build/docs/api to 
/home/hudson/hudson/jobs/Nutch-nutchgora/builds/2012-02-16_04-01-08/javadoc
hudson.util.IOException2: java.lang.ClassFormatError: Failed to load 
javax.servlet.ServletRequest
at hudson.FilePath.copyRecursiveTo(FilePath.java:1701)
at hudson.FilePath.copyRecursiveTo(FilePath.java:1593)
at hudson.tasks.JavadocArchiver.perform(JavadocArchiver.java:101)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at 
hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:700)
at 
hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:675)
at 
hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:653)
at hudson.model.Build$RunnerImpl.post2(Build.java:162)
at 
hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:622)
at hudson.model.Run.run(Run.java:1434)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:238)
Caused by: java.util.concurrent.ExecutionException: java.lang.ClassFormatError: 
Failed to load javax.servlet.ServletRequest
at hudson.remoting.Channel$2.adapt(Channel.java:714)
at hudson.remoting.Channel$2.adapt(Channel.java:709)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
at hudson.FilePath.copyRecursiveTo(FilePath.java:1699)
... 12 more
Caused by: java.lang.ClassFormatError: Failed to load 
javax.servlet.ServletRequest

Build failed in Jenkins: Nutch-trunk #1758

2012-02-15 Thread Apache Jenkins Server

See https://builds.apache.org/job/Nutch-trunk/1758/

--
Started by timer
Building remotely on solaris1 in workspace 
https://builds.apache.org/job/Nutch-trunk/ws/
hudson.util.IOException2: remote file operation failed: 
https://builds.apache.org/job/Nutch-trunk/ws/ at 
hudson.remoting.Channel@50e71521:solaris1
at hudson.FilePath.act(FilePath.java:784)
at hudson.FilePath.act(FilePath.java:770)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:742)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:684)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1195)
at 
hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:576)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:465)
at hudson.model.Run.run(Run.java:1409)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:238)
Caused by: java.io.IOException: Remote call on solaris1 failed
at hudson.remoting.Channel.call(Channel.java:690)
at hudson.FilePath.act(FilePath.java:777)
... 10 more
Caused by: java.lang.ClassFormatError: Failed to load 
javax.servlet.ServletException
at 
hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:154)
at 
hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
at 
hudson.scm.SubversionWorkspaceSelector.syncWorkspaceFormatFromMaster(SubversionWorkspaceSelector.java:85)
at 
hudson.scm.SubversionSCM.createSvnClientManager(SubversionSCM.java:822)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:765)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:752)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2099)
at hudson.remoting.UserRequest.perform(UserRequest.java:118)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:287)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.lang.UnsupportedClassVersionError: Bad version number in .class 
file
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
at java.lang.ClassLoader.defineClass(ClassLoader.java:466)
at 
hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:152)
... 18 more
Recording test results
ERROR: Failed to archive test reports
hudson.util.IOException2: remote file operation failed: 
https://builds.apache.org/job/Nutch-trunk/ws/ at 
hudson.remoting.Channel@50e71521:solaris1
at hudson.FilePath.act(FilePath.java:784)
at hudson.FilePath.act(FilePath.java:770)
at hudson.tasks.junit.JUnitParser.parse(JUnitParser.java:83)
at 
hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:122)
at 
hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:134)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at 
hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:700)
at 
hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:675)
at 
hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:653)
at hudson.model.Build$RunnerImpl.post2(Build.java:162)
at 
hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:622)
at hudson.model.Run.run(Run.java:1434)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:238)
Caused by: java.io.IOException: Remote call on solaris1 failed
at hudson.remoting.Channel.call(Channel.java:690)
at hudson.FilePath.act(FilePath.java:777)
... 14 more
Caused by: java.lang.ClassFormatError: Failed to load 
javax.servlet.ServletException
at 
hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:154)
at 
hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131)

[jira] [Commented] (NUTCH-1129) Any23 Nutch plugin

[jira] [Updated] (NUTCH-1279) Check if limit has been reached in GeneraterReducer must be the first check performance-wise.

[jira] [Commented] (NUTCH-1279) Check if limit has been reached in GeneraterReducer must be the first check performance-wise.

Re: Detecting Encoding with plugins

[jira] [Commented] (NUTCH-1279) Check if limit has been reached in GeneraterReducer must be the first check performance-wise.

Re: Detecting Encoding with plugins

Re: Detecting Encoding with plugins

[jira] [Commented] (NUTCH-1278) Fetch Improvement in threads per host

Re: Detecting Encoding with plugins

Re: Detecting Encoding with plugins

Re: Detecting Encoding with plugins

[jira] [Updated] (NUTCH-1215) UpdateDB should not require segment as input

[jira] [Commented] (NUTCH-1210) DomainBlacklistFilter

[jira] [Commented] (NUTCH-1262) Map `duplicating` content-types to a single type

Re: XSD for Solr Schema

[jira] [Commented] (NUTCH-1210) DomainBlacklistFilter

Re: how linkdb impact on scores

Build failed in Jenkins: Nutch-nutchgora #162

Build failed in Jenkins: Nutch-trunk #1758

19 matches

Site Navigation

Mail list logo

Footer information