[jira] Commented: (NUTCH-779) Mechanism for passing metadata from parse to crawldb

2010-03-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851719#action_12851719
 ] 

Hudson commented on NUTCH-779:
--

Integrated in Nutch-trunk #1112 (See 
[http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1112/])
 Mechanism for passing metadata from parse to crawldb


 Mechanism for passing metadata from parse to crawldb
 

 Key: NUTCH-779
 URL: https://issues.apache.org/jira/browse/NUTCH-779
 Project: Nutch
  Issue Type: New Feature
Reporter: Julien Nioche
Assignee: Julien Nioche
 Fix For: 1.1

 Attachments: NUTCH-779, NUTCH-779-v2.patch


 The patch attached allows to pass parse metadata to the corresponding entry 
 of the crawldb.  
 Comments are welcome

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-779) Mechanism for passing metadata from parse to crawldb

2010-03-29 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850915#action_12850915
 ] 

Julien Nioche commented on NUTCH-779:
-

Could anyone please review this issue? I would like to commit it in time for 
the 1.1 release

 Mechanism for passing metadata from parse to crawldb
 

 Key: NUTCH-779
 URL: https://issues.apache.org/jira/browse/NUTCH-779
 Project: Nutch
  Issue Type: New Feature
Reporter: Julien Nioche
Assignee: Julien Nioche
 Attachments: NUTCH-779, NUTCH-779-v2.patch


 The patch attached allows to pass parse metadata to the corresponding entry 
 of the crawldb.  
 Comments are welcome

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-779) Mechanism for passing metadata from parse to crawldb

2010-03-29 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850939#action_12850939
 ] 

Andrzej Bialecki  commented on NUTCH-779:
-

CrawlDbReducer, the cramped line {{if (metaFromParse!=null){}} needs some 
whitespace fixing.

Other than that, +1.

 Mechanism for passing metadata from parse to crawldb
 

 Key: NUTCH-779
 URL: https://issues.apache.org/jira/browse/NUTCH-779
 Project: Nutch
  Issue Type: New Feature
Reporter: Julien Nioche
Assignee: Julien Nioche
 Attachments: NUTCH-779, NUTCH-779-v2.patch


 The patch attached allows to pass parse metadata to the corresponding entry 
 of the crawldb.  
 Comments are welcome

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (NUTCH-779) Mechanism for passing metadata from parse to crawldb

2010-01-20 Thread MilleBii
I'd like to use Julien's approach because I found the scoring filter complex
to understand.

My use case is the following :
1. during scoring after parsing, I want to tag interesting pages for me, say
meta=HIT
2. in the next step (to be created) I would like to prune the segment of
NON-HIT content in order to optimize segment space (I use nutch caching), I
typically need to ditch 90% of segment data.

Also considering to
4. focus recrawls on HIT pages and their outlinks

Today I don't know really if  how one can retrieve these meta data, I have
manage to avoid storing text content for NON-HIT but it is a dirty trick.


2010/1/19 Andrzej Bialecki (JIRA) j...@apache.org


[
 https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802175#action_12802175]

 Andrzej Bialecki  commented on NUTCH-779:
 -

 Personally I would use ScoringFilters because I'm familiar with the API,
 but the approach that you propose is certainly more user friendly especially
 for novice users.

  Mechanism for passing metadata from parse to crawldb
  
 
  Key: NUTCH-779
  URL: https://issues.apache.org/jira/browse/NUTCH-779
  Project: Nutch
   Issue Type: New Feature
 Reporter: Julien Nioche
  Attachments: NUTCH-779
 
 
  The patch attached allows to pass parse metadata to the corresponding
 entry of the crawldb.
  Comments are welcome

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.




-- 
-MilleBii-


[jira] Commented: (NUTCH-779) Mechanism for passing metadata from parse to crawldb

2010-01-19 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802172#action_12802172
 ] 

Julien Nioche commented on NUTCH-779:
-

 The property needs some documentation in nutch-default.xml plus a sensible 
 default. 

Sure - just wanted the general approach to be checked before doing the tedious 
bits. Do you think it makes sense to do things the way I suggested or would you 
use the ScoringFilters instead?


 Mechanism for passing metadata from parse to crawldb
 

 Key: NUTCH-779
 URL: https://issues.apache.org/jira/browse/NUTCH-779
 Project: Nutch
  Issue Type: New Feature
Reporter: Julien Nioche
 Attachments: NUTCH-779


 The patch attached allows to pass parse metadata to the corresponding entry 
 of the crawldb.  
 Comments are welcome

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-779) Mechanism for passing metadata from parse to crawldb

2010-01-19 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802175#action_12802175
 ] 

Andrzej Bialecki  commented on NUTCH-779:
-

Personally I would use ScoringFilters because I'm familiar with the API, but 
the approach that you propose is certainly more user friendly especially for 
novice users.

 Mechanism for passing metadata from parse to crawldb
 

 Key: NUTCH-779
 URL: https://issues.apache.org/jira/browse/NUTCH-779
 Project: Nutch
  Issue Type: New Feature
Reporter: Julien Nioche
 Attachments: NUTCH-779


 The patch attached allows to pass parse metadata to the corresponding entry 
 of the crawldb.  
 Comments are welcome

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-779) Mechanism for passing metadata from parse to crawldb

2010-01-18 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801875#action_12801875
 ] 

Andrzej Bialecki  commented on NUTCH-779:
-

You can already achieve this with ScoringFilters, although it requires using 
three methods instead ... I would also rename the status to parse_meta, it's 
less cryptic this way. The property needs some documentation in 
nutch-default.xml plus a sensible default.

 Mechanism for passing metadata from parse to crawldb
 

 Key: NUTCH-779
 URL: https://issues.apache.org/jira/browse/NUTCH-779
 Project: Nutch
  Issue Type: New Feature
Reporter: Julien Nioche
 Attachments: NUTCH-779


 The patch attached allows to pass parse metadata to the corresponding entry 
 of the crawldb.  
 Comments are welcome

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.