[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 Dynamically set fetchInterval by MIME-type -- Key: NUTCH-1024 URL: https://issues.apache.org/jira/browse/NUTCH-1024 Project: Nutch Issue Type: New Feature Components: generator Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.6 Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, NUTCH-1024-1.5-1.patch, NUTCH-1024-1.5-2.patch, NUTCH-1024-1.5-3.patch, Nutch.patch, adaptive-mimetypes.txt Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between. * simple key\tvalue\n configuration file * only set fetchInterval for new documents * keep max fetchInterval fixed by current config -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Attachment: NUTCH-1024-1.5-3.patch New patch with proper logging and configuration files. Dynamically set fetchInterval by MIME-type -- Key: NUTCH-1024 URL: https://issues.apache.org/jira/browse/NUTCH-1024 Project: Nutch Issue Type: New Feature Components: generator Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.5 Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, NUTCH-1024-1.5-1.patch, NUTCH-1024-1.5-2.patch, NUTCH-1024-1.5-3.patch, Nutch.patch, adaptive-mimetypes.txt Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between. * simple key\tvalue\n configuration file * only set fetchInterval for new documents * keep max fetchInterval fixed by current config -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Attachment: NUTCH-1024-1.5-3.patch Something went wrong here. Dynamically set fetchInterval by MIME-type -- Key: NUTCH-1024 URL: https://issues.apache.org/jira/browse/NUTCH-1024 Project: Nutch Issue Type: New Feature Components: generator Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.5 Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, NUTCH-1024-1.5-1.patch, NUTCH-1024-1.5-2.patch, NUTCH-1024-1.5-3.patch, Nutch.patch, adaptive-mimetypes.txt Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between. * simple key\tvalue\n configuration file * only set fetchInterval for new documents * keep max fetchInterval fixed by current config -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Attachment: (was: NUTCH-1024-1.5-3.patch) Dynamically set fetchInterval by MIME-type -- Key: NUTCH-1024 URL: https://issues.apache.org/jira/browse/NUTCH-1024 Project: Nutch Issue Type: New Feature Components: generator Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.5 Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, NUTCH-1024-1.5-1.patch, NUTCH-1024-1.5-2.patch, NUTCH-1024-1.5-3.patch, Nutch.patch, adaptive-mimetypes.txt Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between. * simple key\tvalue\n configuration file * only set fetchInterval for new documents * keep max fetchInterval fixed by current config -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Attachment: NUTCH-1024-1.5-2.patch New patch for 1.5 with modifications as per Julien's comments. Dynamically set fetchInterval by MIME-type -- Key: NUTCH-1024 URL: https://issues.apache.org/jira/browse/NUTCH-1024 Project: Nutch Issue Type: New Feature Components: generator Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.5 Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, NUTCH-1024-1.5-1.patch, NUTCH-1024-1.5-2.patch, Nutch.patch, adaptive-mimetypes.txt Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between. * simple key\tvalue\n configuration file * only set fetchInterval for new documents * keep max fetchInterval fixed by current config -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Attachment: NUTCH-1024-1.5-1.patch New patch for trunk! This also includes a change to the injector where injected fetchInterval is added to CrawlDatum MD. In AdaptiveFetchSchedule this injected interval overrides anything else. Dynamically set fetchInterval by MIME-type -- Key: NUTCH-1024 URL: https://issues.apache.org/jira/browse/NUTCH-1024 Project: Nutch Issue Type: New Feature Components: generator Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.5 Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, NUTCH-1024-1.5-1.patch, Nutch.patch, adaptive-mimetypes.txt Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between. * simple key\tvalue\n configuration file * only set fetchInterval for new documents * keep max fetchInterval fixed by current config -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Fix Version/s: (was: 1.4) 1.5 Dynamically set fetchInterval by MIME-type -- Key: NUTCH-1024 URL: https://issues.apache.org/jira/browse/NUTCH-1024 Project: Nutch Issue Type: New Feature Components: generator Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.5 Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between. * simple key\tvalue\n configuration file * only set fetchInterval for new documents * keep max fetchInterval fixed by current config -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Attachment: (was: MimeAdaptiveFetchSchedule.java) Dynamically set fetchInterval by MIME-type -- Key: NUTCH-1024 URL: https://issues.apache.org/jira/browse/NUTCH-1024 Project: Nutch Issue Type: New Feature Components: generator Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.4 Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between. * simple key\tvalue\n configuration file * only set fetchInterval for new documents * keep max fetchInterval fixed by current config -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Attachment: MimeAdaptiveFetchSchedule.java New version with proper handling of Content-Type attrib. In test i didn't include charset which is present in real tests. Dynamically set fetchInterval by MIME-type -- Key: NUTCH-1024 URL: https://issues.apache.org/jira/browse/NUTCH-1024 Project: Nutch Issue Type: New Feature Components: generator Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.4 Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between. * simple key\tvalue\n configuration file * only set fetchInterval for new documents * keep max fetchInterval fixed by current config -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Attachment: MimeAdaptiveFetchSchedule.java adaptive-mimetypes.txt New version that allows for separate inc and dec rate values per MIME-type. Conf file format is now: mime\tinc_rate\tdec_rate. Code uses internal struct for storing rates per mime in a hashmap. Please comment. Dynamically set fetchInterval by MIME-type -- Key: NUTCH-1024 URL: https://issues.apache.org/jira/browse/NUTCH-1024 Project: Nutch Issue Type: New Feature Components: generator Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.4 Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between. * simple key\tvalue\n configuration file * only set fetchInterval for new documents * keep max fetchInterval fixed by current config -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Attachment: (was: MimeAdaptiveFetchSchedule.java) Dynamically set fetchInterval by MIME-type -- Key: NUTCH-1024 URL: https://issues.apache.org/jira/browse/NUTCH-1024 Project: Nutch Issue Type: New Feature Components: generator Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.4 Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between. * simple key\tvalue\n configuration file * only set fetchInterval for new documents * keep max fetchInterval fixed by current config -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Attachment: (was: adaptive-mimetypes.txt) Dynamically set fetchInterval by MIME-type -- Key: NUTCH-1024 URL: https://issues.apache.org/jira/browse/NUTCH-1024 Project: Nutch Issue Type: New Feature Components: generator Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.4 Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between. * simple key\tvalue\n configuration file * only set fetchInterval for new documents * keep max fetchInterval fixed by current config -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Attachment: Nutch.patch AdaptiveFetchSchedule.patch MimeAdaptiveFetchSchedule.java adaptive-mimetypes.txt Here's a first WIP. It extends AdaptiveFetchSchedule and changes INC/DEC rates depending on current MIME-type. It also patches AdaptiveFetch so that INC and DEC properties are protected and settable from the child. I also added two propertis to metadata.Nutch for reading the Content-Type key as Writable from the CrawlDatum MetaData. That was a bit of trickery. It uses original INC and DEC rate values for CrawlDatum without a Content-Type in their MetaData or with unconfigured Content-Types. Please comment. There must be something wrong as it seems to work. :) Dynamically set fetchInterval by MIME-type -- Key: NUTCH-1024 URL: https://issues.apache.org/jira/browse/NUTCH-1024 Project: Nutch Issue Type: New Feature Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 2.0 Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between. * simple key\tvalue\n configuration file * only set fetchInterval for new documents * keep max fetchInterval fixed by current config -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1024: - Component/s: generator Patch Info: [Patch Available] Fix Version/s: (was: 2.0) 1.4 Dynamically set fetchInterval by MIME-type -- Key: NUTCH-1024 URL: https://issues.apache.org/jira/browse/NUTCH-1024 Project: Nutch Issue Type: New Feature Components: generator Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.4 Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between. * simple key\tvalue\n configuration file * only set fetchInterval for new documents * keep max fetchInterval fixed by current config -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira