[jira] [Updated] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-02-24 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien Nguyen Manh updated NUTCH-2234: Attachment: NUTCH-2234.patch > Upgrade to elasticsearch 2.1.1 >

[jira] [Updated] (NUTCH-1687) Pick queue in Round Robin

2016-02-24 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien Nguyen Manh updated NUTCH-1687: Attachment: NUTCH-1687-2.patch Here it is: I update my initial patch for version 1.11. I

[jira] [Updated] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-: Description: This problem happens at the the second time I crawl a page {code}

[jira] [Updated] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-02-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-: Summary: re-fetch deletes all metadata except _csh_ and _rs_ (was: fetch deletes

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-24 Thread Thamme Gowda N (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166447#comment-15166447 ] Thamme Gowda N commented on NUTCH-2144: --- Hi [~wastl-nagel], Were you able to test this plugin? I

[Nutch Wiki] Update of "SimilarityScoringFilter" by SujenShah

2016-02-24 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "SimilarityScoringFilter" page has been changed by SujenShah: https://wiki.apache.org/nutch/SimilarityScoringFilter?action=diff=3=4 1. Copy the gold-standard file into the conf

[jira] [Assigned] (NUTCH-2222) fetch deletes all metadata except _csh_ and _rs_

2016-02-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-: --- Assignee: Lewis John McGibbney > fetch deletes all metadata except _csh_

[jira] [Resolved] (NUTCH-2231) Jexl support in generator job

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2231. -- Resolution: Fixed Committed to trunk in revision 1732177. This Jexl stuff is awesome! > Jexl

[jira] [Updated] (NUTCH-2231) Jexl support in generator job

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2231: - Attachment: NUTCH-2231.patch Updated patch that transforms hyphens in field identifiers to

[jira] [Updated] (NUTCH-1687) Pick queue in Round Robin

2016-02-24 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien Nguyen Manh updated NUTCH-1687: Attachment: (was: NUTCH-1687-2.patch) > Pick queue in Round Robin >

[jira] [Issue Comment Deleted] (NUTCH-1687) Pick queue in Round Robin

2016-02-24 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien Nguyen Manh updated NUTCH-1687: Comment: was deleted (was: I update my initial patch for ver 1.11. I crawl large number of

[jira] [Updated] (NUTCH-2229) Allow Jexl expressions on CrawlDatum's fixed attributes

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2229: - Description: CrawlDatum allows Jexl expressions on its metadata fields nicely, but it lacks the

[jira] [Updated] (NUTCH-2231) Jexl support in generator job

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2231: - Description: Generator should support Jexl expressions. This would make it much easier to

[jira] [Updated] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-02-24 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien Nguyen Manh updated NUTCH-2234: Attachment: (was: NUTCH-2234.patch) > Upgrade to elasticsearch 2.1.1 >

[jira] [Updated] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-02-24 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien Nguyen Manh updated NUTCH-2234: Attachment: NUTCH-2234.patch > Upgrade to elasticsearch 2.1.1 >

[jira] [Created] (NUTCH-2234) Upgrade to elasticsearch 2.1.1

2016-02-24 Thread Tien Nguyen Manh (JIRA)
Tien Nguyen Manh created NUTCH-2234: --- Summary: Upgrade to elasticsearch 2.1.1 Key: NUTCH-2234 URL: https://issues.apache.org/jira/browse/NUTCH-2234 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2232) DeduplicationJob should decode URL's before length is compared

2016-02-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163138#comment-15163138 ] Hudson commented on NUTCH-2232: --- SUCCESS: Integrated in Nutch-trunk #3354 (See

[jira] [Updated] (NUTCH-2231) Jexl support in generator job

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2231: - Attachment: NUTCH-2231.patch Patch for trunk! It adds a JexlUtil where the expression parsing is

[jira] [Updated] (NUTCH-1687) Pick queue in Round Robin

2016-02-24 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien Nguyen Manh updated NUTCH-1687: Attachment: NUTCH-1687-2.patch I update my initial patch for ver 1.11. I crawl large number

[jira] [Closed] (NUTCH-1179) Option to restrict generated records by metadata

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-1179. > Option to restrict generated records by metadata > >

[jira] [Closed] (NUTCH-2215) Generator to restrict crawl to mime type

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-2215. > Generator to restrict crawl to mime type > > >

[jira] [Resolved] (NUTCH-2215) Generator to restrict crawl to mime type

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2215. -- Resolution: Duplicate > Generator to restrict crawl to mime type >

[jira] [Updated] (NUTCH-2215) Generator to restrict crawl to mime type

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2215: - Affects Version/s: (was: 1.11) > Generator to restrict crawl to mime type >

[jira] [Updated] (NUTCH-2215) Generator to restrict crawl to mime type

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2215: - Fix Version/s: (was: 1.12) > Generator to restrict crawl to mime type >

[jira] [Resolved] (NUTCH-2232) DeduplicationJob should decode URL's before length is compared

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2232. -- Resolution: Fixed Assignee: Markus Jelsma Committed to trunk in revision 1732160. Thanks

[jira] [Updated] (NUTCH-2232) DeduplicationJob should decode URL's before length is compared

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2232: - Attachment: NUTCH-2232.patch Updated patch with only the following modification: * moved imports

[jira] [Commented] (NUTCH-2229) Allow Jexl expressions on CrawlDatum's fixed attributes

2016-02-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163025#comment-15163025 ] Hudson commented on NUTCH-2229: --- SUCCESS: Integrated in Nutch-trunk #3353 (See

[jira] [Created] (NUTCH-2233) Index-basic incorrect assignment of next fetch time when using Mongodb as storage backend

2016-02-24 Thread Pablo Torres (JIRA)
Pablo Torres created NUTCH-2233: --- Summary: Index-basic incorrect assignment of next fetch time when using Mongodb as storage backend Key: NUTCH-2233 URL: https://issues.apache.org/jira/browse/NUTCH-2233

[jira] [Updated] (NUTCH-2232) DeduplicationJob should decode URL's before length is compared

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2232: - Summary: DeduplicationJob should decode URL's before length is compared (was: DeduplicationJob:

[jira] [Commented] (NUTCH-2232) DeduplicationJob: Url is not decoded before the url length is compared.

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163003#comment-15163003 ] Markus Jelsma commented on NUTCH-2232: -- Yes, there is clearly a difference in length between

[jira] [Updated] (NUTCH-2232) DeduplicationJob: Url is not decoded before the url length is compared.

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2232: - Affects Version/s: 1.11 > DeduplicationJob: Url is not decoded before the url length is compared.

[jira] [Updated] (NUTCH-2232) DeduplicationJob: Url is not decoded before the url length is compared.

2016-02-24 Thread Ron van der Vegt (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ron van der Vegt updated NUTCH-2232: Attachment: NUTCH-2232.patch > DeduplicationJob: Url is not decoded before the url length

[jira] [Created] (NUTCH-2232) DeduplicationJob: Url is not decoded before the url length is compared.

2016-02-24 Thread Ron van der Vegt (JIRA)
Ron van der Vegt created NUTCH-2232: --- Summary: DeduplicationJob: Url is not decoded before the url length is compared. Key: NUTCH-2232 URL: https://issues.apache.org/jira/browse/NUTCH-2232 Project:

[jira] [Resolved] (NUTCH-2229) Allow Jexl expressions on CrawlDatum's fixed attributes

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2229. -- Resolution: Fixed Committed to trunk in revision 1732140. > Allow Jexl expressions on

[jira] [Commented] (NUTCH-2229) Allow Jexl expressions on CrawlDatum's fixed attributes

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15162945#comment-15162945 ] Markus Jelsma commented on NUTCH-2229: -- Ah, this works very nicely! I'll commit shortly! > Allow

[jira] [Created] (NUTCH-2231) Jexl support in generator job

2016-02-24 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2231: Summary: Jexl support in generator job Key: NUTCH-2231 URL: https://issues.apache.org/jira/browse/NUTCH-2231 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-2229) Allow Jexl expressions on CrawlDatum's fixed attributes

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2229: - Attachment: NUTCH-2229.patch Patch for trunk! > Allow Jexl expressions on CrawlDatum's fixed

[jira] [Updated] (NUTCH-2229) Allow Jexl expressions on CrawlDatum's fixed attributes

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2229: - Patch Info: Patch Available Description: CrawlDatum allows Jexl expressions on its metadata

I have one small question that always intrigue me

2016-02-24 Thread Zara Parst
Hi everyone, I am really need your help, please read below If we have to run solr in cloud mode, we are going to use zookeeper, now any zookeeper client can connect to zookeeper server, Zookeeper has facility to protect znode however any one can see znode acl however password could be