Build failed in Jenkins: Nutch-trunk #2158

2013-04-06 Thread Apache Jenkins Server
See 

--
[...truncated 5531 lines...]

resolve-default:
[ivy:resolve] :: loading settings :: file = 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/ivy/ivysettings.xml

compile:
 [echo] Compiling plugin: urlfilter-suffix
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/plugin/build-plugin.xml:117:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds

compile-test:
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/plugin/build-plugin.xml:180:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 1 source file to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/build/urlfilter-suffix/test
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 1 warning

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlfilter-suffix
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/home/hudson/tools/ant/latest/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/build/lib/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.nutch.urlfilter.suffix.TestSuffixURLFilter
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.177 sec
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.338 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/ivy/ivysettings.xml

compile:
 [echo] Compiling plugin: urlfilter-validator
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/plugin/build-plugin.xml:117:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds

compile-test:
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/plugin/build-plugin.xml:180:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 1 source file to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/build/urlfilter-validator/test
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 1 warning

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlfilter-validator
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/home/hudson/tools/ant/latest/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/build/lib/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.nutch.urlfilter.validator.TestUrlValidator
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.02 sec
[junit] Running org.apache.nutch.tika.TestRTFParser

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/ivy/ivysettings.xml
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.008 sec

compile:
 [echo] Compiling plugin: urlnormalizer-basic
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/plugin/build-plugin.xml:117:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds

compile-test:
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/plugin/build-plugin.xml:180:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 1 source file to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/build/urlnormalizer-basic/test
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 1 warning

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-basic
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/home/hudson/tools/ant/latest/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trun

Re: Nutch

2013-04-06 Thread Tejas Patil
On Sat, Apr 6, 2013 at 9:58 AM, Parin Jogani  wrote:

> Hi,
> Is there any way to perform a urlfilter from level 1-5 and a different one
> from 5 onwards. I need to extract pdf files which will be only after a
> given level (just to experiment).
>
You can run 2 crawls over the same crawldb using different urlfilter files.
First one would be rejecting pdf files and executed till a depth just
before you discover pdf files. For later crawl, modify the regex rule to
accept pdf files.


> After that I believe the pdf files will be stored in a compressed binary
> format in the crawl\segment folder. I would like to extract these pdf files
> and store all in 1 folder. (I guess since Nutch uses MapReduce by segments
> the data, I will need to use the hadoop api present by default in the lib
> folder. I can not find more tutorials on the same except
> allenday<
> http://www-scf.usc.edu/~csci572/2013Spring/homework/nutch/allenday20080829.html
> >
> ).
>
I had a peek at the link that you gave and seems like that code snippet
should work. Its an old article (from 2010) so it might happen that some
classes are replaced with new ones. If you face any issues, please feel
free to shoot an email to us !!!

>
> PJ
>


Nutch

2013-04-06 Thread Parin Jogani
Hi,
Is there any way to perform a urlfilter from level 1-5 and a different one
from 5 onwards. I need to extract pdf files which will be only after a
given level (just to experiment).
After that I believe the pdf files will be stored in a compressed binary
format in the crawl\segment folder. I would like to extract these pdf files
and store all in 1 folder. (I guess since Nutch uses MapReduce by segments
the data, I will need to use the hadoop api present by default in the lib
folder. I can not find more tutorials on the same except
allenday
).

PJ


[jira] [Updated] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-04-06 Thread lufeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lufeng updated NUTCH-1545:
--

Attachment: NUTCH-1545-v2.patch

1. remove any concept of crawldb and segments in bin/crawl script
2. fix the capture batchID in bin/crawl script through add an argument in 
GenerateJob class. It will get an batchId if necessary.

any comments please.

> capture batchId and remove references to segments in 2.x crawl script.
> --
>
> Key: NUTCH-1545
> URL: https://issues.apache.org/jira/browse/NUTCH-1545
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 2.1
>Reporter: Lewis John McGibbney
>Assignee: lufeng
>Priority: Minor
> Fix For: 2.2
>
> Attachments: NUTCH-1545.patch, NUTCH-1545-v2.patch
>
>
> The concept of segment is replaced by batchId in 2.x
> I'm currently getting rid of segments references in 2.x
> This issue was flagged up and separate from NUTCH-1532 which I am working on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira