[
https://issues.apache.org/jira/browse/NUTCH-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1780:
Fix Version/s: 2.3
> ttl and gc_grace_seconds attributes are missing from
> gora-c
[
https://issues.apache.org/jira/browse/NUTCH-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1776:
Fix Version/s: 2.3
> Log incorrect plugin.folder file path
> --
[
https://issues.apache.org/jira/browse/NUTCH-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-1780.
-
Resolution: Fixed
Committed @revision 1595398 in 2.X HEAD
Thank you [~kaveh] very
[
https://issues.apache.org/jira/browse/NUTCH-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1774:
Fix Version/s: (was: 2.4)
2.3
> Crawling from REST API givin
Julien Nioche created NUTCH-1779:
Summary: Apply formatting to the code
Key: NUTCH-1779
URL: https://issues.apache.org/jira/browse/NUTCH-1779
Project: Nutch
Issue Type: Task
Affects Versi
[
https://issues.apache.org/jira/browse/NUTCH-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1779:
Fix Version/s: 2.3
> Apply formatting to the code
>
>
Lewis John McGibbney created NUTCH-1781:
---
Summary: Update gora-*-mapping.xml and gora.proeprties to reflect
Gora 0.4
Key: NUTCH-1781
URL: https://issues.apache.org/jira/browse/NUTCH-1781
Project
[
https://issues.apache.org/jira/browse/NUTCH-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1780:
Description: after upgrading to Gora 0.4 ( NUTCH-1714) we need extra
properties in
Julien Nioche created NUTCH-1777:
Summary: Fetcher not getting all the entries in input
Key: NUTCH-1777
URL: https://issues.apache.org/jira/browse/NUTCH-1777
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994098#comment-13994098
]
Lewis John McGibbney edited comment on NUTCH-1714 at 5/10/14 1:19 AM:
--
[
https://issues.apache.org/jira/browse/NUTCH-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney closed NUTCH-1780.
---
> ttl and gc_grace_seconds attributes are missing from
> gora-cassandra-mapping.xml file
[
https://issues.apache.org/jira/browse/NUTCH-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000569#comment-14000569
]
Lewis John McGibbney commented on NUTCH-1774:
-
[~sreemanth] Crawler class no l
[
https://issues.apache.org/jira/browse/NUTCH-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000571#comment-14000571
]
Lewis John McGibbney commented on NUTCH-1779:
-
final patch to be committed to
[
https://issues.apache.org/jira/browse/NUTCH-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-1674.
--
Resolution: Fixed
Committed revision 1594813.
Thanks everyone!
> Use batchId filter to enable
[
https://issues.apache.org/jira/browse/NUTCH-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Diaa updated NUTCH-1776:
Attachment: Logging file path error.patch
@ [~wastl-nagel]
changed level to warn and removed new logger.
Do you thi
Hi,
In some cases when you crawl a webpage you already know many page urls that
have a similar structure.
For example in imdb entertainment artists have the following link structure:
http://www.imdb.com/name/nm1/
http://www.imdb.com/name/nm2/
http://www.imdb.com/name/nm6499112/
How about allowing
[
https://issues.apache.org/jira/browse/NUTCH-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kaveh minooie updated NUTCH-1780:
-
Attachment: NUTCH-1780.patch
> ttl and gc_grace_seconds attributes are missing from
> gora-cassa
Julien Nioche created NUTCH-1778:
Summary: Generator not logging number of URLs in batch correctly
Key: NUTCH-1778
URL: https://issues.apache.org/jira/browse/NUTCH-1778
Project: Nutch
Issue T
[
https://issues.apache.org/jira/browse/NUTCH-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998598#comment-13998598
]
Sebastian Nagel commented on NUTCH-1772:
?? What about committing this one as a te
[
https://issues.apache.org/jira/browse/NUTCH-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1774:
Fix Version/s: 2.4
> Crawling from REST API giving NullPointerException
> -
[
https://issues.apache.org/jira/browse/NUTCH-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994218#comment-13994218
]
Julien Nioche commented on NUTCH-1622:
--
Lewis - this has already been committed in tr
[
https://issues.apache.org/jira/browse/NUTCH-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Diaa updated NUTCH-1783:
Attachment: cleanup temp folders.patch
> Cleanup temp folders in case of failures
> ---
[
https://issues.apache.org/jira/browse/NUTCH-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999434#comment-13999434
]
Lewis John McGibbney commented on NUTCH-1709:
-
Yep... I'll update this to refl
[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993574#comment-13993574
]
Julien Nioche commented on NUTCH-1714:
--
Ralf - your questions is not directly related
[
https://issues.apache.org/jira/browse/NUTCH-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998527#comment-13998527
]
Sebastian Nagel commented on NUTCH-1776:
+1 (looks ok)
Is there a reason why a new
[
https://issues.apache.org/jira/browse/NUTCH-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998578#comment-13998578
]
Sebastian Nagel commented on NUTCH-1605:
Tested with of a few dozens of documents
[
https://issues.apache.org/jira/browse/NUTCH-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1770:
-
Fix Version/s: (was: 2.3)
> Nutch is failing to parse all PDFs
>
[
https://issues.apache.org/jira/browse/NUTCH-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999673#comment-13999673
]
Julien Nioche commented on NUTCH-207:
-
Fix log level in revision 1595135.
> Bandwidth
[
https://issues.apache.org/jira/browse/NUTCH-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kaveh minooie updated NUTCH-1780:
-
Attachment: (was: NUTCH-1780.patch)
> ttl and gc_grace_seconds attributes are missing from
>
[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-1714.
--
Resolution: Fixed
Committed revision 1594812.
Thanks to Alparslan and everyone involved!
> Nu
[
https://issues.apache.org/jira/browse/NUTCH-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1718:
---
Summary: redefine http.robots.agent as "additional agent names" (was:
update description of
[
https://issues.apache.org/jira/browse/NUTCH-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999102#comment-13999102
]
Julien Nioche commented on NUTCH-207:
-
Hi Sebastian,
Not really. I will revert it back
Markus Jelsma created NUTCH-1782:
Summary: NodeWalker to return current node
Key: NUTCH-1782
URL: https://issues.apache.org/jira/browse/NUTCH-1782
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Diaa updated NUTCH-1776:
Attachment: (was: PluginManifestParser.java.patch)
> Log incorrect plugin.folder file path
> --
[
https://issues.apache.org/jira/browse/NUTCH-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1782:
-
Attachment: NUTCH-1782-trunk.patch
Patch!
> NodeWalker to return current node
>
Diaa created NUTCH-1783:
---
Summary: Cleanup temp folders in case of failures
Key: NUTCH-1783
URL: https://issues.apache.org/jira/browse/NUTCH-1783
Project: Nutch
Issue Type: Bug
Affects Versions: 1.
Thanks!
Created a JIRA issue with the patch
https://issues.apache.org/jira/browse/NUTCH-1783
On Tue, May 13, 2014 at 12:19 AM, Markus Jelsma
wrote:
> Hi Diaa,
>
> Yes, you can open an issue for these fixes and attach patches if you can.
>
> Cheers,
> Markus
>
>
>
> Diaa Abdallah schreef:
>
>
[
https://issues.apache.org/jira/browse/NUTCH-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998575#comment-13998575
]
Julien Nioche commented on NUTCH-1709:
--
[NUTCH-1714] has been committed without Lewis
[
https://issues.apache.org/jira/browse/NUTCH-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999415#comment-13999415
]
Lewis John McGibbney commented on NUTCH-1774:
-
I have no issue with committing
[
https://issues.apache.org/jira/browse/NUTCH-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998632#comment-13998632
]
Sebastian Nagel commented on NUTCH-1613:
For cookie support there exists already N
[
https://issues.apache.org/jira/browse/NUTCH-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1718:
---
Attachment: NUTCH-1718-trunk.v2.patch
Updated patch:
* for backward compatibility: take care
[
https://issues.apache.org/jira/browse/NUTCH-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-1676.
--
Resolution: Fixed
trunk => Committed revision 1595193
2.x => Committed revision 1595196.
Thank
[
https://issues.apache.org/jira/browse/NUTCH-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998581#comment-13998581
]
Julien Nioche commented on NUTCH-1768:
--
Any more testers for this one?
> port NUTCH-
[
https://issues.apache.org/jira/browse/NUTCH-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998586#comment-13998586
]
Julien Nioche commented on NUTCH-1772:
--
Thanks Diaa. I will have a look at it a bit l
[
https://issues.apache.org/jira/browse/NUTCH-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-1772.
--
Resolution: Fixed
Fix Version/s: 1.9
Committed revision 1595137.
thanks for the reviews
[
https://issues.apache.org/jira/browse/NUTCH-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999894#comment-13999894
]
Markus Jelsma commented on NUTCH-1676:
--
thanks jul for taking over!
> Add rudimentar
kaveh minooie created NUTCH-1780:
Summary: ttl and gc_grace_seconds attributes are missing from
gora-cassandra-mapping.xml file
Key: NUTCH-1780
URL: https://issues.apache.org/jira/browse/NUTCH-1780
Pr
[
https://issues.apache.org/jira/browse/NUTCH-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998593#comment-13998593
]
Sebastian Nagel commented on NUTCH-207:
---
Hi Julien, I've just observed that the log l
[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998563#comment-13998563
]
Julien Nioche commented on NUTCH-1714:
--
Hi [~kaveh],
Please open a separate issue f
[
https://issues.apache.org/jira/browse/NUTCH-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kaveh minooie updated NUTCH-1780:
-
Attachment: NUTCH-1780.patch
there is really no good default value for gc_grace_seconds. we can u
[
https://issues.apache.org/jira/browse/NUTCH-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1770:
-
Affects Version/s: (was: 2.3)
> Nutch is failing to parse all PDFs
>
[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999417#comment-13999417
]
Lewis John McGibbney commented on NUTCH-1714:
-
Excellent @jnioche and @alparsl
[
https://issues.apache.org/jira/browse/NUTCH-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-926:
--
Attachment: NUTCH-926-trunk.patch
Patch for current trunk:
* meta refresh redirects are filtered
[
https://issues.apache.org/jira/browse/NUTCH-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998333#comment-13998333
]
Lewis John McGibbney commented on NUTCH-1773:
-
bq. hduser@bl4ck1c3:~/nutch-2.3
54 matches
Mail list logo