Re: Jenkins build failures after git migration

2016-04-20 Thread Sebastian Nagel
Thanks, the path to JUnit result files and Javadoc is fixed now.
Jenkins builds (1.x and 2.x) are back to normal.

Sebastian

On 04/18/2016 05:56 PM, Mattmann, Chris A (3980) wrote:
> Hey Seb, I’ll also take a look. @Lewis could potentially help here
> too. Lewis any time to scope?
> 
> 
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 4/18/16, 4:40 AM, "Sebastian Nagel"  wrote:
> 
>> Hi,
>>
>> the last successful builds for both branches
>> https://builds.apache.org/job/Nutch-trunk/
>> https://builds.apache.org/job/Nutch-nutchgora/
>> were in February before the svn to git migration.
>>
>> The reason is probably a changed path to the build workspace.
>> When comparing the logs
>> https://builds.apache.org/job/Nutch-trunk/3356/consoleText
>> and
>> https://builds.apache.org/job/Nutch-trunk/3360/consoleText
>>
>> (3356, svn)
>>  Buildfile: /home/jenkins/jenkins-slave/workspace/Nutch-trunk/trunk/build.xml
>>
>> (3360, git)
>> Buildfile: /home/jenkins/jenkins-slave/workspace/Nutch-trunk/build.xml
>>
>> Although the ant build succeeds, the XML test reports are not found which 
>> causes
>> the build to be marked as failed:
>>
>> (3360, git)
>> BUILD SUCCESSFUL
>> Total time: 12 minutes 37 seconds
>> [xUnit] [INFO] - Starting to record.
>> [xUnit] [INFO] - Processing JUnit
>> [xUnit] [INFO] - [JUnit] - No test report file(s) were found with the pattern
>> 'trunk/build/test/TEST-*.xml' relative to 
>> '/home/jenkins/jenkins-slave/workspace/Nutch-trunk' for
>> the testing framework 'JUnit'.  Did you enter a pattern relative to the 
>> correct directory?  Did you
>> generate the result report(s) for 'JUnit'?
>> ...
>> Finished: FAILURE
>>
>>
>> Does anyone know how to fix this?
>> I could dig into it later today or tomorrow.
>>
>> Thanks,
>> Sebastian



[jira] [Commented] (NUTCH-1785) Ability to index raw content

2016-04-20 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250812#comment-15250812
 ] 

Sebastian Nagel commented on NUTCH-1785:


The class o.a.n.indexer.NutchField supports only a couple of classes as 
document field value: String, Boolean, Integer, Long, Float, Date.  But also 
IndexWriter implementations (indexer plugins) must support all used data types, 
resp. the data must provide a toString() method. In case of byte[], toString() 
does not return a meaningful String (you hardly want to index {{[B@13afed55}}.  
The conversion via {{new String(bytes)}} isn't stable, cf. NUTCH-1807.  
However, it is a clean string, readable, though it may not preserve 
bytes/characters from the original.  That's probably the intention.

Maybe it's anyway better to preserve the original encoding, esp. for base64 
where a String representation is defined.  Please, open a new issue for your 
problem.  Can you give an example for the charset issue?

> Ability to index raw content
> 
>
> Key: NUTCH-1785
> URL: https://issues.apache.org/jira/browse/NUTCH-1785
> Project: Nutch
>  Issue Type: New Feature
>  Components: indexer
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Fix For: 1.11
>
> Attachments: NUTCH-1785-trunk.patch, NUTCH-1785-trunk.patch, 
> NUTCH-1785-trunk.patch, NUTCH-1785-trunk.patch, NUTCH-1785-trunkv2.patch
>
>
> Some use-cases require Nutch to actually write the raw content a configured 
> indexing back-end. Since Content is never read, a plugin is out of the 
> question and therefore we need to force IndexJob to process Content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : Nutch-nutchgora #1553

2016-04-20 Thread Apache Jenkins Server
See 



Jenkins build is back to normal : Nutch-trunk #3361

2016-04-20 Thread Apache Jenkins Server
See 



[jira] [Updated] (NUTCH-2253) ProtocolFactory still not thread-safe

2016-04-20 Thread Leon Misakyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leon Misakyan updated NUTCH-2253:
-
Description: 
Hi, as I can see in 1.11 release ProtocolFactory class still has an issue in 
getProtocol method. This is because every fetcher thread has its own 
ProtocolFactory instance (this.protocolFactory = new ProtocolFactory(conf); in 
FetcherThread constructor.)
So have this method synchronized is useless, because each thread has its own 
monitor.
In our project we have issue of having multiple Protocol instances.
Issue can be fixed if getProtocol method will use shared conf instance as lock 
object or by having one ProtocolFactory for all fetcher threads. 


  was:
Hi, as I can see in 1.11 release ProtocolFactory clas still has an issue in 
getProtocol method. This is because every fetcher thread has its own 
ProtocolFactory instance (this.protocolFactory = new ProtocolFactory(conf); in 
FetcherThread constructor.)
So have this method synchronized is useless, because each thread has its own 
monitor.
In our project we have issue of having multiple Protocol instances.
Issue can be fixed if getProtocol method will use shared conf instance as lock 
object or by having one ProtocolFactory for all fetcher threads. 



> ProtocolFactory still not thread-safe
> -
>
> Key: NUTCH-2253
> URL: https://issues.apache.org/jira/browse/NUTCH-2253
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.10, 1.11
>Reporter: Leon Misakyan
> Fix For: 2.3, 1.8
>
>
> Hi, as I can see in 1.11 release ProtocolFactory class still has an issue in 
> getProtocol method. This is because every fetcher thread has its own 
> ProtocolFactory instance (this.protocolFactory = new ProtocolFactory(conf); 
> in FetcherThread constructor.)
> So have this method synchronized is useless, because each thread has its own 
> monitor.
> In our project we have issue of having multiple Protocol instances.
> Issue can be fixed if getProtocol method will use shared conf instance as 
> lock object or by having one ProtocolFactory for all fetcher threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2253) ProtocolFactory still not thread-safe

2016-04-20 Thread Leon Misakyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leon Misakyan updated NUTCH-2253:
-
Description: 
Hi, as I can see in 1.11 release ProtocolFactory clas still has an issue in 
getProtocol method. This is because every fetcher thread has its own 
ProtocolFactory instance (this.protocolFactory = new ProtocolFactory(conf); in 
FetcherThread constructor.)
So have this method synchronized is useless, because each thread has its own 
monitor.
In our project we have issue of having multiple Protocol instances.
Issue can be fixed if getProtocol method will use shared conf instance as lock 
object or by having one ProtocolFactory for all fetcher threads. 


  was:
The method getProtocol() should be synchronized otherwise the Fetcher threads 
can access it around the same time and query the cache before it's had a chance 
of being populated properly. This would happen for a handful of calls until the 
subsequent ones get the cache but this should be fixed nonetheless e.g. when we 
want a guarantee that the same Protocol instance will be called for the same 
fetching session.
The other Factor classes which use the same cache mechanism would suffer from 
the same problem.


> ProtocolFactory still not thread-safe
> -
>
> Key: NUTCH-2253
> URL: https://issues.apache.org/jira/browse/NUTCH-2253
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.10, 1.11
>Reporter: Leon Misakyan
> Fix For: 2.3, 1.8
>
>
> Hi, as I can see in 1.11 release ProtocolFactory clas still has an issue in 
> getProtocol method. This is because every fetcher thread has its own 
> ProtocolFactory instance (this.protocolFactory = new ProtocolFactory(conf); 
> in FetcherThread constructor.)
> So have this method synchronized is useless, because each thread has its own 
> monitor.
> In our project we have issue of having multiple Protocol instances.
> Issue can be fixed if getProtocol method will use shared conf instance as 
> lock object or by having one ProtocolFactory for all fetcher threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2253) ProtocolFactory still not thread-safe

2016-04-20 Thread Leon Misakyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leon Misakyan updated NUTCH-2253:
-
Affects Version/s: (was: 2.2.1)
   1.10
   1.11

> ProtocolFactory still not thread-safe
> -
>
> Key: NUTCH-2253
> URL: https://issues.apache.org/jira/browse/NUTCH-2253
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.10, 1.11
>Reporter: Leon Misakyan
> Fix For: 2.3, 1.8
>
>
> The method getProtocol() should be synchronized otherwise the Fetcher threads 
> can access it around the same time and query the cache before it's had a 
> chance of being populated properly. This would happen for a handful of calls 
> until the subsequent ones get the cache but this should be fixed nonetheless 
> e.g. when we want a guarantee that the same Protocol instance will be called 
> for the same fetching session.
> The other Factor classes which use the same cache mechanism would suffer from 
> the same problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2253) ProtocolFactory still not thread-safe

2016-04-20 Thread Leon Misakyan (JIRA)
Leon Misakyan created NUTCH-2253:


 Summary: ProtocolFactory still not thread-safe
 Key: NUTCH-2253
 URL: https://issues.apache.org/jira/browse/NUTCH-2253
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: 2.2.1
Reporter: Leon Misakyan
 Fix For: 2.3, 1.8


The method getProtocol() should be synchronized otherwise the Fetcher threads 
can access it around the same time and query the cache before it's had a chance 
of being populated properly. This would happen for a handful of calls until the 
subsequent ones get the cache but this should be fixed nonetheless e.g. when we 
want a guarantee that the same Protocol instance will be called for the same 
fetching session.
The other Factor classes which use the same cache mechanism would suffer from 
the same problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1785) Ability to index raw content

2016-04-20 Thread Federico Bonelli (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249465#comment-15249465
 ] 

Federico Bonelli commented on NUTCH-1785:
-

I'm experiencing charset issues with this patch, probably due to Sebastian 
Nagel's remark:
bq. conversion via {code} new String(content.getContent()) {code} is needless 
if base64 is true

I will now try to base64 encode the content.getContent() byte array directly, 
but I was wondering about the inital intent behind the conversion back and 
forth from byte[] to String and back to byte[] before base64 encoding.

{code:java}
String binary = new String(content.getContent());

// optionally encode as base64
if (base64) {
binary = Base64.encodeBase64String(StringUtils.getBytesUtf8(binary));
}
{code}

What was the inital intent behind this?

> Ability to index raw content
> 
>
> Key: NUTCH-1785
> URL: https://issues.apache.org/jira/browse/NUTCH-1785
> Project: Nutch
>  Issue Type: New Feature
>  Components: indexer
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Fix For: 1.11
>
> Attachments: NUTCH-1785-trunk.patch, NUTCH-1785-trunk.patch, 
> NUTCH-1785-trunk.patch, NUTCH-1785-trunk.patch, NUTCH-1785-trunkv2.patch
>
>
> Some use-cases require Nutch to actually write the raw content a configured 
> indexing back-end. Since Content is never read, a plugin is out of the 
> question and therefore we need to force IndexJob to process Content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)