[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1024:
-

Fix Version/s: (was: 1.5)
   1.6

20120304-push-1.6

 Dynamically set fetchInterval by MIME-type
 --

 Key: NUTCH-1024
 URL: https://issues.apache.org/jira/browse/NUTCH-1024
 Project: Nutch
  Issue Type: New Feature
  Components: generator
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.6

 Attachments: AdaptiveFetchSchedule.patch, 
 MimeAdaptiveFetchSchedule.java, NUTCH-1024-1.5-1.patch, 
 NUTCH-1024-1.5-2.patch, NUTCH-1024-1.5-3.patch, Nutch.patch, 
 adaptive-mimetypes.txt


 Add facility to configure default or fixed fetchInterval values by MIME-type. 
 This is useful for conserving resources for files that are known to change 
 frequently or never and everything in between.
 * simple key\tvalue\n configuration file
 * only set fetchInterval for new documents
 * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2012-03-30 Thread Markus Jelsma (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1024:
-

Attachment: NUTCH-1024-1.5-3.patch

New patch with proper logging and configuration files.

 Dynamically set fetchInterval by MIME-type
 --

 Key: NUTCH-1024
 URL: https://issues.apache.org/jira/browse/NUTCH-1024
 Project: Nutch
  Issue Type: New Feature
  Components: generator
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.5

 Attachments: AdaptiveFetchSchedule.patch, 
 MimeAdaptiveFetchSchedule.java, NUTCH-1024-1.5-1.patch, 
 NUTCH-1024-1.5-2.patch, NUTCH-1024-1.5-3.patch, Nutch.patch, 
 adaptive-mimetypes.txt


 Add facility to configure default or fixed fetchInterval values by MIME-type. 
 This is useful for conserving resources for files that are known to change 
 frequently or never and everything in between.
 * simple key\tvalue\n configuration file
 * only set fetchInterval for new documents
 * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2012-03-30 Thread Markus Jelsma (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1024:
-

Attachment: NUTCH-1024-1.5-3.patch

Something went wrong here. 

 Dynamically set fetchInterval by MIME-type
 --

 Key: NUTCH-1024
 URL: https://issues.apache.org/jira/browse/NUTCH-1024
 Project: Nutch
  Issue Type: New Feature
  Components: generator
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.5

 Attachments: AdaptiveFetchSchedule.patch, 
 MimeAdaptiveFetchSchedule.java, NUTCH-1024-1.5-1.patch, 
 NUTCH-1024-1.5-2.patch, NUTCH-1024-1.5-3.patch, Nutch.patch, 
 adaptive-mimetypes.txt


 Add facility to configure default or fixed fetchInterval values by MIME-type. 
 This is useful for conserving resources for files that are known to change 
 frequently or never and everything in between.
 * simple key\tvalue\n configuration file
 * only set fetchInterval for new documents
 * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2012-03-30 Thread Markus Jelsma (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1024:
-

Attachment: (was: NUTCH-1024-1.5-3.patch)

 Dynamically set fetchInterval by MIME-type
 --

 Key: NUTCH-1024
 URL: https://issues.apache.org/jira/browse/NUTCH-1024
 Project: Nutch
  Issue Type: New Feature
  Components: generator
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.5

 Attachments: AdaptiveFetchSchedule.patch, 
 MimeAdaptiveFetchSchedule.java, NUTCH-1024-1.5-1.patch, 
 NUTCH-1024-1.5-2.patch, NUTCH-1024-1.5-3.patch, Nutch.patch, 
 adaptive-mimetypes.txt


 Add facility to configure default or fixed fetchInterval values by MIME-type. 
 This is useful for conserving resources for files that are known to change 
 frequently or never and everything in between.
 * simple key\tvalue\n configuration file
 * only set fetchInterval for new documents
 * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2012-03-29 Thread Markus Jelsma (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1024:
-

Attachment: NUTCH-1024-1.5-2.patch

New patch for 1.5 with modifications as per Julien's comments.

 Dynamically set fetchInterval by MIME-type
 --

 Key: NUTCH-1024
 URL: https://issues.apache.org/jira/browse/NUTCH-1024
 Project: Nutch
  Issue Type: New Feature
  Components: generator
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.5

 Attachments: AdaptiveFetchSchedule.patch, 
 MimeAdaptiveFetchSchedule.java, NUTCH-1024-1.5-1.patch, 
 NUTCH-1024-1.5-2.patch, Nutch.patch, adaptive-mimetypes.txt


 Add facility to configure default or fixed fetchInterval values by MIME-type. 
 This is useful for conserving resources for files that are known to change 
 frequently or never and everything in between.
 * simple key\tvalue\n configuration file
 * only set fetchInterval for new documents
 * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2012-03-02 Thread Markus Jelsma (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1024:
-

Attachment: NUTCH-1024-1.5-1.patch

New patch for trunk! This also includes a change to the injector where injected 
fetchInterval is added to CrawlDatum MD. In AdaptiveFetchSchedule this injected 
interval overrides anything else.

 Dynamically set fetchInterval by MIME-type
 --

 Key: NUTCH-1024
 URL: https://issues.apache.org/jira/browse/NUTCH-1024
 Project: Nutch
  Issue Type: New Feature
  Components: generator
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.5

 Attachments: AdaptiveFetchSchedule.patch, 
 MimeAdaptiveFetchSchedule.java, NUTCH-1024-1.5-1.patch, Nutch.patch, 
 adaptive-mimetypes.txt


 Add facility to configure default or fixed fetchInterval values by MIME-type. 
 This is useful for conserving resources for files that are known to change 
 frequently or never and everything in between.
 * simple key\tvalue\n configuration file
 * only set fetchInterval for new documents
 * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2011-10-06 Thread Markus Jelsma (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1024:
-

Fix Version/s: (was: 1.4)
   1.5

 Dynamically set fetchInterval by MIME-type
 --

 Key: NUTCH-1024
 URL: https://issues.apache.org/jira/browse/NUTCH-1024
 Project: Nutch
  Issue Type: New Feature
  Components: generator
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.5

 Attachments: AdaptiveFetchSchedule.patch, 
 MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt


 Add facility to configure default or fixed fetchInterval values by MIME-type. 
 This is useful for conserving resources for files that are known to change 
 frequently or never and everything in between.
 * simple key\tvalue\n configuration file
 * only set fetchInterval for new documents
 * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2011-08-22 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1024:
-

Attachment: (was: MimeAdaptiveFetchSchedule.java)

 Dynamically set fetchInterval by MIME-type
 --

 Key: NUTCH-1024
 URL: https://issues.apache.org/jira/browse/NUTCH-1024
 Project: Nutch
  Issue Type: New Feature
  Components: generator
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.4

 Attachments: AdaptiveFetchSchedule.patch, 
 MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt


 Add facility to configure default or fixed fetchInterval values by MIME-type. 
 This is useful for conserving resources for files that are known to change 
 frequently or never and everything in between.
 * simple key\tvalue\n configuration file
 * only set fetchInterval for new documents
 * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2011-08-22 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1024:
-

Attachment: MimeAdaptiveFetchSchedule.java

New version with proper handling of Content-Type attrib. In test i didn't 
include charset which is present in real tests.

 Dynamically set fetchInterval by MIME-type
 --

 Key: NUTCH-1024
 URL: https://issues.apache.org/jira/browse/NUTCH-1024
 Project: Nutch
  Issue Type: New Feature
  Components: generator
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.4

 Attachments: AdaptiveFetchSchedule.patch, 
 MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt


 Add facility to configure default or fixed fetchInterval values by MIME-type. 
 This is useful for conserving resources for files that are known to change 
 frequently or never and everything in between.
 * simple key\tvalue\n configuration file
 * only set fetchInterval for new documents
 * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2011-08-22 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1024:
-

Attachment: MimeAdaptiveFetchSchedule.java
adaptive-mimetypes.txt

New version that allows for separate inc and dec rate values per MIME-type. 
Conf file format is now: mime\tinc_rate\tdec_rate. Code uses internal struct 
for storing rates per mime in a hashmap.

Please comment.

 Dynamically set fetchInterval by MIME-type
 --

 Key: NUTCH-1024
 URL: https://issues.apache.org/jira/browse/NUTCH-1024
 Project: Nutch
  Issue Type: New Feature
  Components: generator
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.4

 Attachments: AdaptiveFetchSchedule.patch, 
 MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt


 Add facility to configure default or fixed fetchInterval values by MIME-type. 
 This is useful for conserving resources for files that are known to change 
 frequently or never and everything in between.
 * simple key\tvalue\n configuration file
 * only set fetchInterval for new documents
 * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2011-08-22 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1024:
-

Attachment: (was: MimeAdaptiveFetchSchedule.java)

 Dynamically set fetchInterval by MIME-type
 --

 Key: NUTCH-1024
 URL: https://issues.apache.org/jira/browse/NUTCH-1024
 Project: Nutch
  Issue Type: New Feature
  Components: generator
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.4

 Attachments: AdaptiveFetchSchedule.patch, 
 MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt


 Add facility to configure default or fixed fetchInterval values by MIME-type. 
 This is useful for conserving resources for files that are known to change 
 frequently or never and everything in between.
 * simple key\tvalue\n configuration file
 * only set fetchInterval for new documents
 * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2011-08-22 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1024:
-

Attachment: (was: adaptive-mimetypes.txt)

 Dynamically set fetchInterval by MIME-type
 --

 Key: NUTCH-1024
 URL: https://issues.apache.org/jira/browse/NUTCH-1024
 Project: Nutch
  Issue Type: New Feature
  Components: generator
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.4

 Attachments: AdaptiveFetchSchedule.patch, 
 MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt


 Add facility to configure default or fixed fetchInterval values by MIME-type. 
 This is useful for conserving resources for files that are known to change 
 frequently or never and everything in between.
 * simple key\tvalue\n configuration file
 * only set fetchInterval for new documents
 * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2011-08-19 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1024:
-

Attachment: Nutch.patch
AdaptiveFetchSchedule.patch
MimeAdaptiveFetchSchedule.java
adaptive-mimetypes.txt

Here's a first WIP. It extends AdaptiveFetchSchedule and changes INC/DEC rates 
depending on current MIME-type. It also patches AdaptiveFetch so that INC and 
DEC properties are protected and settable from the child. I also added two 
propertis to metadata.Nutch for reading the Content-Type key as Writable from 
the CrawlDatum MetaData. That was a bit of trickery.

It uses original INC and DEC rate values for CrawlDatum without a Content-Type 
in their MetaData or with unconfigured Content-Types.

Please comment. There must be something wrong as it seems to work. :)

 Dynamically set fetchInterval by MIME-type
 --

 Key: NUTCH-1024
 URL: https://issues.apache.org/jira/browse/NUTCH-1024
 Project: Nutch
  Issue Type: New Feature
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 2.0

 Attachments: AdaptiveFetchSchedule.patch, 
 MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt


 Add facility to configure default or fixed fetchInterval values by MIME-type. 
 This is useful for conserving resources for files that are known to change 
 frequently or never and everything in between.
 * simple key\tvalue\n configuration file
 * only set fetchInterval for new documents
 * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2011-08-19 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1024:
-

  Component/s: generator
   Patch Info: [Patch Available]
Fix Version/s: (was: 2.0)
   1.4

 Dynamically set fetchInterval by MIME-type
 --

 Key: NUTCH-1024
 URL: https://issues.apache.org/jira/browse/NUTCH-1024
 Project: Nutch
  Issue Type: New Feature
  Components: generator
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.4

 Attachments: AdaptiveFetchSchedule.patch, 
 MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt


 Add facility to configure default or fixed fetchInterval values by MIME-type. 
 This is useful for conserving resources for files that are known to change 
 frequently or never and everything in between.
 * simple key\tvalue\n configuration file
 * only set fetchInterval for new documents
 * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira