[
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-2234:
Attachment: NUTCH-2234.patch
> Upgrade to elasticsearch 2.1.1
>
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1687:
Attachment: NUTCH-1687-2.patch
Here it is:
I update my initial patch for version 1.11.
I
[
https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-:
Description:
This problem happens at the the second time I crawl a page
{code}
[
https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-:
Summary: re-fetch deletes all metadata except _csh_ and _rs_ (was: fetch
deletes
[
https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166447#comment-15166447
]
Thamme Gowda N commented on NUTCH-2144:
---
Hi [~wastl-nagel],
Were you able to test this plugin?
I
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change
notification.
The "SimilarityScoringFilter" page has been changed by SujenShah:
https://wiki.apache.org/nutch/SimilarityScoringFilter?action=diff=3=4
1. Copy the gold-standard file into the conf
[
https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney reassigned NUTCH-:
---
Assignee: Lewis John McGibbney
> fetch deletes all metadata except _csh_
[
https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2231.
--
Resolution: Fixed
Committed to trunk in revision 1732177. This Jexl stuff is awesome!
> Jexl
[
https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2231:
-
Attachment: NUTCH-2231.patch
Updated patch that transforms hyphens in field identifiers to
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1687:
Attachment: (was: NUTCH-1687-2.patch)
> Pick queue in Round Robin
>
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1687:
Comment: was deleted
(was: I update my initial patch for ver 1.11.
I crawl large number of
[
https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2229:
-
Description:
CrawlDatum allows Jexl expressions on its metadata fields nicely, but it lacks
the
[
https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2231:
-
Description:
Generator should support Jexl expressions. This would make it much easier to
[
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-2234:
Attachment: (was: NUTCH-2234.patch)
> Upgrade to elasticsearch 2.1.1
>
[
https://issues.apache.org/jira/browse/NUTCH-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-2234:
Attachment: NUTCH-2234.patch
> Upgrade to elasticsearch 2.1.1
>
Tien Nguyen Manh created NUTCH-2234:
---
Summary: Upgrade to elasticsearch 2.1.1
Key: NUTCH-2234
URL: https://issues.apache.org/jira/browse/NUTCH-2234
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163138#comment-15163138
]
Hudson commented on NUTCH-2232:
---
SUCCESS: Integrated in Nutch-trunk #3354 (See
[
https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2231:
-
Attachment: NUTCH-2231.patch
Patch for trunk! It adds a JexlUtil where the expression parsing is
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tien Nguyen Manh updated NUTCH-1687:
Attachment: NUTCH-1687-2.patch
I update my initial patch for ver 1.11.
I crawl large number
[
https://issues.apache.org/jira/browse/NUTCH-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-1179.
> Option to restrict generated records by metadata
>
>
[
https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2215.
> Generator to restrict crawl to mime type
>
>
>
[
https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2215.
--
Resolution: Duplicate
> Generator to restrict crawl to mime type
>
[
https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2215:
-
Affects Version/s: (was: 1.11)
> Generator to restrict crawl to mime type
>
[
https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2215:
-
Fix Version/s: (was: 1.12)
> Generator to restrict crawl to mime type
>
[
https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2232.
--
Resolution: Fixed
Assignee: Markus Jelsma
Committed to trunk in revision 1732160. Thanks
[
https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2232:
-
Attachment: NUTCH-2232.patch
Updated patch with only the following modification:
* moved imports
[
https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163025#comment-15163025
]
Hudson commented on NUTCH-2229:
---
SUCCESS: Integrated in Nutch-trunk #3353 (See
Pablo Torres created NUTCH-2233:
---
Summary: Index-basic incorrect assignment of next fetch time when
using Mongodb as storage backend
Key: NUTCH-2233
URL: https://issues.apache.org/jira/browse/NUTCH-2233
[
https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2232:
-
Summary: DeduplicationJob should decode URL's before length is compared
(was: DeduplicationJob:
[
https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163003#comment-15163003
]
Markus Jelsma commented on NUTCH-2232:
--
Yes, there is clearly a difference in length between
[
https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2232:
-
Affects Version/s: 1.11
> DeduplicationJob: Url is not decoded before the url length is compared.
[
https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ron van der Vegt updated NUTCH-2232:
Attachment: NUTCH-2232.patch
> DeduplicationJob: Url is not decoded before the url length
Ron van der Vegt created NUTCH-2232:
---
Summary: DeduplicationJob: Url is not decoded before the url
length is compared.
Key: NUTCH-2232
URL: https://issues.apache.org/jira/browse/NUTCH-2232
Project:
[
https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2229.
--
Resolution: Fixed
Committed to trunk in revision 1732140.
> Allow Jexl expressions on
[
https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15162945#comment-15162945
]
Markus Jelsma commented on NUTCH-2229:
--
Ah, this works very nicely! I'll commit shortly!
> Allow
Markus Jelsma created NUTCH-2231:
Summary: Jexl support in generator job
Key: NUTCH-2231
URL: https://issues.apache.org/jira/browse/NUTCH-2231
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2229:
-
Attachment: NUTCH-2229.patch
Patch for trunk!
> Allow Jexl expressions on CrawlDatum's fixed
[
https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2229:
-
Patch Info: Patch Available
Description:
CrawlDatum allows Jexl expressions on its metadata
Hi everyone,
I am really need your help, please read below
If we have to run solr in cloud mode, we are going to use zookeeper, now
any zookeeper client can connect to zookeeper server, Zookeeper has
facility to protect znode however any one can see znode acl however
password could be
39 matches
Mail list logo