Re: How does scoring chain work

2017-03-29 Thread lewis john mcgibbney
Hi Yongyao,

In addition to Seb's response, please also check out the
'scoring.filter.order' property in nutch-site.xml
https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L1429-L1437
This will determine the order and provide you with more control over
complex scoring logic.
Lewis

On Wed, Mar 29, 2017 at 6:16 AM,  wrote:

>
> From: Yongyao Jiang 
> To: user@nutch.apache.org
> Cc:
> Bcc:
> Date: Tue, 28 Mar 2017 16:48:38 -0400
> Subject: How does scoring chain work
> Hi,
>
> I got a question about how the scoring works when I was trying to use
> multiple scoring plugins together.
>
> For example, if I use "scoring-(opic|similarity)", the opic score of a page
> is 0.2, and the similarity score is 0.5, what would be the final score? Is
> there anyway to configure this?
>
> Thanks,
> Yongyao
>
>


Re: How does scoring chain work

2017-03-29 Thread Sebastian Nagel
Hi,

the score calculated so far (by the filter(s) in front of the chain)
is passed to the method of the next scoring filter
- either directly as float argument
- or as field, e.g., of a CrawlDatum object

It's on the ScoringFilter implementation whether to ignore this value
(overwriting it), to use it for its calculation (usually as factor
for multiplication), or even do nothing and leave it untouched.

In doubt, you have to check the actual implementations of the various
interface methods (injectedScore, ..., indexerScore).

Sebastian

On 03/28/2017 10:48 PM, Yongyao Jiang wrote:
> Hi,
> 
> I got a question about how the scoring works when I was trying to use
> multiple scoring plugins together.
> 
> For example, if I use "scoring-(opic|similarity)", the opic score of a page
> is 0.2, and the similarity score is 0.5, what would be the final score? Is
> there anyway to configure this?
> 
> Thanks,
> Yongyao
> 



Re: [VOTE] Release Apache Nutch 1.13 RC#1

2017-03-29 Thread Mattmann, Chris A (3010)
+1 from me:

SIGS  + CHECKSUMS check out.
LMC-053601:nutch-release mattmann$ $HOME/bin/stage_apache_rc apache-nutch 
1.13-src https://dist.apache.org/repos/dist/dev/nutch/1.13/
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100 4998k  100 4998k0 0   840k  0  0:00:05  0:00:05 --:--:--  983k
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100   819  100   8190 0   1590  0 --:--:-- --:--:-- --:--:--  1593
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
10068  100680 0141  0 --:--:-- --:--:-- --:--:--   141
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
10077  100770 0158  0 --:--:-- --:--:-- --:--:--   159
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100 7979k  100 7979k0 0   915k  0  0:00:08  0:00:08 --:--:-- 1107k
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100   819  100   8190 0   1646  0 --:--:-- --:--:-- --:--:--  1647
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
10065  100650 0138  0 --:--:-- --:--:-- --:--:--   138
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
10074  100740 0156  0 --:--:-- --:--:-- --:--:--   156
LMC-053601:nutch-release mattmann$ $HOME/bin/stage_apache_rc apache-nutch 
1.13-bin https://dist.apache.org/repos/dist/dev/nutch/1.13/
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100  225M  100  225M0 0   901k  0  0:04:16  0:04:16 --:--:-- 1056k
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100   819  100   8190 0   1644  0 --:--:-- --:--:-- --:--:--  1647
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
10068  100680 0110  0 --:--:-- --:--:-- --:--:--   110
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
10077  100770 0116  0 --:--:-- --:--:-- --:--:--   116
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100  227M  100  227M0 0   910k  0  0:04:16  0:04:16 --:--:--  821k
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100   819  100   8190 0   1710  0 --:--:-- --:--:-- --:--:--  1709
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
10065  100650 0129  0 --:--:-- --:--:-- --:--:--   129
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
10074  100740 0153  0 --:--:-- --:--:-- --:--:--   153
LMC-053601:nutch-release mattmann$ $HOME/bin/verify_gpg_sigs
Verifying Signature for file apache-nutch-1.13-bin.tar.gz.asc
gpg: assuming signed data in `apache-nutch-1.13-bin.tar.gz'
gpg: Signature made Wed Mar 29 00:09:41 2017 EDT using RSA key ID 48BAEBF6
gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: DB7B 5199 121C 08A5 C8F4  052B 3A47 17F0 48BA EBF6
Verifying Signature for file apache-nutch-1.13-bin.zip.asc
gpg: assuming signed data in `apache-nutch-1.13-bin.zip'
gpg: Signature made Wed Mar 29 00:09:47 2017 EDT using RSA key ID 48BAEBF6
gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
"
gpg: WARNING: This key is not certified with a 

Re: [VOTE] Release Apache Nutch 1.13 RC#1

2017-03-29 Thread Sebastian Nagel
+1

 + signatures verified
 + compile + test
 + small test crawl and index into Solr 4.10.4

 - clean (CleaningJob) on Solr 4 fails but this is already known (NUTCH-2269)
 - indexing into Solr 5.5.0 fails without a properly configured managed-schema
   We should deliver a proper managed-schema to conf/ and also update the 
tutorial.

Sebastian

On 03/29/2017 06:20 AM, lewis john mcgibbney wrote:
> Hi Folks,
> 
> A first candidate for the Nutch 1.13 release is available at:
> 
>   https://dist.apache.org/repos/dist/dev/nutch/1.13/
> 
> The release candidate is a zip and tar.gz archive of the binary and sources
> in:
> https://github.com/apache/nutch/tree/release-1.13
> 
> The SHA1 checksum of the archive is
> bd0da3569aa14105799ed39204d4f0a31c77b42c
> 
> In addition, a staged maven repository is available here:
> 
> https://repository.apache.org/content/repositories/orgapachenutch-1013
> 
> We addressed 29 Issues - https://s.apache.org/wq3x
> 
> Please vote on releasing this package as Apache Nutch 1.13.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Nutch PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Nutch 1.13.
> [ ] -1 Do not release this package because…
> 
> Cheers,
> Lewis
> (On behalf of the Nutch PMC)
> 
> P.S. Here is my +1.
> 



Re: [VOTE] Release Apache Nutch 1.13 RC#1

2017-03-29 Thread Kevin Ratnasekera
+1

On Wed, Mar 29, 2017 at 4:06 PM, Julien Nioche <
lists.digitalpeb...@gmail.com> wrote:

> Hi Lewis
>
> +1 compiled from source and ran a small crawl in local mode. All good!
>
> Thanks
>
> Julien
>
> On 29 March 2017 at 05:20, lewis john mcgibbney 
> wrote:
>
>> Hi Folks,
>>
>> A first candidate for the Nutch 1.13 release is available at:
>>
>>   https://dist.apache.org/repos/dist/dev/nutch/1.13/
>>
>> The release candidate is a zip and tar.gz archive of the binary and
>> sources
>> in:
>> https://github.com/apache/nutch/tree/release-1.13
>>
>> The SHA1 checksum of the archive is
>> bd0da3569aa14105799ed39204d4f0a31c77b42c
>>
>> In addition, a staged maven repository is available here:
>>
>> https://repository.apache.org/content/repositories/orgapachenutch-1013
>>
>> We addressed 29 Issues - https://s.apache.org/wq3x
>>
>> Please vote on releasing this package as Apache Nutch 1.13.
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 Nutch PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Nutch 1.13.
>> [ ] -1 Do not release this package because…
>>
>> Cheers,
>> Lewis
>> (On behalf of the Nutch PMC)
>>
>> P.S. Here is my +1.
>>
>> --
>> http://home.apache.org/~lewismc/
>> @hectorMcSpector
>> http://www.linkedin.com/in/lmcgibbney
>>
>
>
>
> --
>
> *Open Source Solutions for Text Engineering*
>
> http://www.digitalpebble.com
> http://digitalpebble.blogspot.com/
> #digitalpebble 
>


Re: [VOTE] Release Apache Nutch 1.13 RC#1

2017-03-29 Thread Julien Nioche
Hi Lewis

+1 compiled from source and ran a small crawl in local mode. All good!

Thanks

Julien

On 29 March 2017 at 05:20, lewis john mcgibbney  wrote:

> Hi Folks,
>
> A first candidate for the Nutch 1.13 release is available at:
>
>   https://dist.apache.org/repos/dist/dev/nutch/1.13/
>
> The release candidate is a zip and tar.gz archive of the binary and sources
> in:
> https://github.com/apache/nutch/tree/release-1.13
>
> The SHA1 checksum of the archive is
> bd0da3569aa14105799ed39204d4f0a31c77b42c
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachenutch-1013
>
> We addressed 29 Issues - https://s.apache.org/wq3x
>
> Please vote on releasing this package as Apache Nutch 1.13.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Nutch PMC votes are cast.
>
> [ ] +1 Release this package as Apache Nutch 1.13.
> [ ] -1 Do not release this package because…
>
> Cheers,
> Lewis
> (On behalf of the Nutch PMC)
>
> P.S. Here is my +1.
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 

*Open Source Solutions for Text Engineering*

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble