[
https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445841#comment-13445841
]
Matt MacDonald commented on NUTCH-1445:
---
Hi,
I'm attempting to use the
[
https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445849#comment-13445849
]
Ferdy Galema commented on NUTCH-1445:
-
Hi Matt,
Sure we can resolve your issue here.
[
https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445850#comment-13445850
]
Ferdy Galema commented on NUTCH-1445:
-
(feature requests should be future requests
[
https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445860#comment-13445860
]
Matt MacDonald commented on NUTCH-1445:
---
Ferdy,
Thanks for the help. I'll
Hi all,
I am new to dev... I am working on NUTCH-1150...
I would like to get some directions before I can start... Right now I am
going through the Fetcher.java code...
I have tried running nutch with a sample site with two different urls
redirecting to a common resource.
I could not find any
Here is the link to the issue -
https://issues.apache.org/jira/browse/NUTCH-1150
On Fri, Aug 31, 2012 at 5:37 PM, Vijith vijithkv...@gmail.com wrote:
Hi all,
I am new to dev... I am working on NUTCH-1150...
I would like to get some directions before I can start... Right now I am
going
Hi all,
(Please ignore my previous mail, if any)
I am new to dev... I am working on NUTCH-1150...
https://issues.apache.org/jira/browse/NUTCH-1150
I would like to get some directions before I can start... Right now I am
going through the Fetcher.java code...
I have tried running nutch with a
[
https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445871#comment-13445871
]
Ferdy Galema commented on NUTCH-1445:
-
Ah I got it now.
It's definitely a bug. When
Ferdy Galema created NUTCH-1462:
---
Summary: Elasticsearch not indexing when type==null in
NutchDocument metadata
Key: NUTCH-1462
URL: https://issues.apache.org/jira/browse/NUTCH-1462
Project: Nutch
I apologize..I was sending to mailing list with out subscribing to it. I
found the reply from Lewis (from archive). I will comment directly on the
issue. Thanks.
On Fri, Aug 31, 2012 at 5:59 PM, Vijith vijithkv...@gmail.com wrote:
Hi all,
(Please ignore my previous mail, if any)
I am new
[
https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445872#comment-13445872
]
Matt MacDonald commented on NUTCH-1445:
---
Great! I was just looking in
[
https://issues.apache.org/jira/browse/NUTCH-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferdy Galema updated NUTCH-1462:
Attachment: nutch-1462.patch
Elasticsearch not indexing when type==null in NutchDocument
No hassle Vijith
Thank you
Lewis
On Fri, Aug 31, 2012 at 1:37 PM, Vijith vijithkv...@gmail.com wrote:
I apologize..I was sending to mailing list with out subscribing to it. I
found the reply from Lewis (from archive). I will comment directly on the
issue. Thanks.
On Fri, Aug 31, 2012 at
[
https://issues.apache.org/jira/browse/NUTCH-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445874#comment-13445874
]
Vijith Kumar V commented on NUTCH-1150:
---
I have tried running nutch with a sample
[
https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445878#comment-13445878
]
Ferdy Galema commented on NUTCH-1445:
-
Created NUTCH-1462 for a fix. For a quick-fix
[
https://issues.apache.org/jira/browse/NUTCH-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferdy Galema closed NUTCH-1462.
---
Resolution: Fixed
committed
Elasticsearch not indexing when type==null in
Ferdy Galema created NUTCH-1463:
---
Summary: Elasticsearch indexer should wait and check response for
last flush
Key: NUTCH-1463
URL: https://issues.apache.org/jira/browse/NUTCH-1463
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferdy Galema updated NUTCH-1463:
Attachment: nutch-1463.patch
Elasticsearch indexer should wait and check response for last
[
https://issues.apache.org/jira/browse/NUTCH-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferdy Galema closed NUTCH-1463.
---
Resolution: Fixed
committed.
Elasticsearch indexer should wait and check response
[
https://issues.apache.org/jira/browse/NUTCH-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferdy Galema closed NUTCH-1448.
---
Resolution: Fixed
Committed.
Redirected urls should be handled more cleanly (more
[
https://issues.apache.org/jira/browse/NUTCH-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445903#comment-13445903
]
Vijith V commented on NUTCH-1150:
-
Here is my setup. Page1 (only seed) has links to Page2
[
https://issues.apache.org/jira/browse/NUTCH-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445907#comment-13445907
]
Markus Jelsma commented on NUTCH-1150:
--
Ah, i assume you're doing the parse step
[
https://issues.apache.org/jira/browse/NUTCH-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445918#comment-13445918
]
Vijith V commented on NUTCH-1150:
-
Yes I was doing so. Thanks. I tried with fetcher.parse.
[
https://issues.apache.org/jira/browse/NUTCH-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445918#comment-13445918
]
Vijith V edited comment on NUTCH-1150 at 9/1/12 12:33 AM:
--
Yes I
I have tried running nutch with a sample site with two different urls
redirecting to a common resource.
I could not find any clues, from hadoop.log, where the common resource is
parsed multiple times.
Could some one please explain the exact scenario that creates this bug.
And how does this bug
[
https://issues.apache.org/jira/browse/NUTCH-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445930#comment-13445930
]
Luca Cavanna commented on NUTCH-1100:
-
I agree, it would make even more sense to
-Original message-
From:Vijith vijithkv...@gmail.com
Sent: Fri 31-Aug-2012 15:44
To: dev@nutch.apache.org
Subject: Re: Need some directions
I have tried running nutch with a sample site with two different urls
redirecting to a common resource.
I could not find any clues, from
Luca Cavanna created NUTCH-1464:
---
Summary: index-static plugin doesn't allow the colon within the
field value
Key: NUTCH-1464
URL: https://issues.apache.org/jira/browse/NUTCH-1464
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luca Cavanna updated NUTCH-1464:
Description: If I want to configure a static field with a value containing
a colon, the
[
https://issues.apache.org/jira/browse/NUTCH-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445937#comment-13445937
]
Lewis John McGibbney commented on NUTCH-1464:
-
Nice catch Luca. Do you have a
[
https://issues.apache.org/jira/browse/NUTCH-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luca Cavanna updated NUTCH-1464:
Attachment: NUTCH-1464.patch
I do have a patch, but it's against 1.5 branch. Anyway it's really
Hi Folks,
There is an issue with protocol-file plugin in while fetching files that
contain CJK characters in the file name. JIRA Nutch 968
After I checked the code, I discovered that the problem due to the encoding
in the file name while fetching the directory. After changing couple of
lines as
[
https://issues.apache.org/jira/browse/NUTCH-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446002#comment-13446002
]
Luca Cavanna commented on NUTCH-1100:
-
The problem with the approach I mentioned
Hi Ye,
Please feel free to comment fully on any issue you find onthe Nutch Jira.
If you find other/additional bugs or improvements when are not already
opened on the Jira instance then please feel free to open ones once
you are sure they are not duplicates and/or can be resolved via the
user@
Thanks for the welcome,
The issue is due to the encoding in the file name. To fix it, I needed to
make two changes in FileResponse.java in protocol-file plugin.
The fixes were for temp solution thus I hard coded the encoding to utf-8.
It would be better idea to read the encoding from the
[
https://issues.apache.org/jira/browse/NUTCH-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferdy Galema closed NUTCH-1431.
---
Resolution: Fixed
committed
Introduce link 'distance' and add configurable max
[
https://issues.apache.org/jira/browse/NUTCH-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446380#comment-13446380
]
Christian Johnsson commented on NUTCH-1448:
---
Will this affect the outlink and
[
https://issues.apache.org/jira/browse/NUTCH-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446396#comment-13446396
]
Lewis John McGibbney commented on NUTCH-1461:
-
Hi Christian, you make some
[
https://issues.apache.org/jira/browse/NUTCH-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446487#comment-13446487
]
Christian Johnsson commented on NUTCH-1461:
---
Sure, this one should do the trick.
[
https://issues.apache.org/jira/browse/NUTCH-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christian Johnsson updated NUTCH-1461:
--
Attachment: TabelUtil_Fix.patch
Quick fix incase there are some non valid domains in
[
https://issues.apache.org/jira/browse/NUTCH-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446487#comment-13446487
]
Christian Johnsson edited comment on NUTCH-1461 at 9/1/12 10:27 AM:
[
https://issues.apache.org/jira/browse/NUTCH-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446487#comment-13446487
]
Christian Johnsson edited comment on NUTCH-1461 at 9/1/12 10:43 AM:
[
https://issues.apache.org/jira/browse/NUTCH-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446511#comment-13446511
]
Ferdy Galema commented on NUTCH-872:
Yes that is correct.
Change the
[
https://issues.apache.org/jira/browse/NUTCH-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446515#comment-13446515
]
Ferdy Galema commented on NUTCH-1448:
-
Yes it does show up as an outlink.
About your
[
https://issues.apache.org/jira/browse/NUTCH-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446518#comment-13446518
]
Ferdy Galema commented on NUTCH-1461:
-
Added comment in NUTCH-1448.
[
https://issues.apache.org/jira/browse/NUTCH-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446568#comment-13446568
]
Christian Johnsson commented on NUTCH-872:
--
I applied the patch and did a test run
[
https://issues.apache.org/jira/browse/NUTCH-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446570#comment-13446570
]
Christian Johnsson commented on NUTCH-1448:
---
Thank you for the information.
Yes
See https://builds.apache.org/job/Nutch-nutchgora/334/changes
Changes:
[ferdy] NUTCH-1431 Introduce link 'distance' and add configurable max distance
in the generator
[ferdy] NUTCH-1448 Redirected urls should be handled more cleanly (more like an
outlink url)
[ferdy] NUTCH-1463 Elasticsearch
[
https://issues.apache.org/jira/browse/NUTCH-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446589#comment-13446589
]
Hudson commented on NUTCH-1448:
---
Integrated in Nutch-nutchgora #334 (See
[
https://issues.apache.org/jira/browse/NUTCH-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446591#comment-13446591
]
Hudson commented on NUTCH-1462:
---
Integrated in Nutch-nutchgora #334 (See
50 matches
Mail list logo