[
https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13272150#comment-13272150
]
Ferdy Galema commented on NUTCH-1363:
-
I'm not sure I follow. What makes this property
Ferdy Galema created NUTCH-1365:
---
Summary: Fix crawlId functionalilty by making using of new gora
configuration
Key: NUTCH-1365
URL: https://issues.apache.org/jira/browse/NUTCH-1365
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferdy Galema updated NUTCH-1365:
Attachment: NUTCH-1365.patch
Fix crawlId functionalilty by making using of new gora
Hi Markus,
thanks for your reply, but that is not what I want.
Why store data into solr that I do not need? I do not want use solr. My
goal is to crawl terra byte of data, store data in hbase or other store
and do some processing an it, so this unneeded data causes pain. I have
to filter the
Hi
What do you mean by `the function of learning to rank` ?
Cheers,
On Thu, 10 May 2012 16:37:00 +0800, 柳胜兵 colin.liu1...@gmail.com
wrote:
hello,all.
I want to know whether nutch project is to plan to implement the
function of learning to rank.
that is to say ,could we use some data of document relevance to query
generated by expert or log of users'clickthrough to get a more complex ,but
better , ranking model by machine learning .
2012/5/10 Markus Jelsma markus.jel...@openindex.io
Hi
What do you mean by `the function of learning
Ah i see. Well, no. Nutch is not a search engine anymore. You can do
this with Solr and some custom script parsing it's log and emitting
external file fields but not with Nutch.
On Thu, 10 May 2012 17:02:12 +0800, 柳胜兵 colin.liu1...@gmail.com
wrote:
that is to say ,could we use some data of
Well, I see. Thanks.
2012/5/10 Markus Jelsma markus.jel...@openindex.io
Ah i see. Well, no. Nutch is not a search engine anymore. You can do this
with Solr and some custom script parsing it's log and emitting external
file fields but not with Nutch.
On Thu, 10 May 2012 17:02:12 +0800, 柳胜兵
[
https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13272231#comment-13272231
]
Ferdy Galema commented on NUTCH-1306:
-
Lewis,
Do you suggest to add the commit as
[
https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1360:
Attachment: NUTCH-1360-nutchgora.patch
This is a real WIP for nutchgora. It would
[
https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13272234#comment-13272234
]
Lewis John McGibbney commented on NUTCH-1360:
-
As all protocol plugins try to
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1325:
-
Attachment: NUTCH-1325-1.6-1.patch
Initial patch. This introduces a HostDB that keeps track of
[
https://issues.apache.org/jira/browse/NUTCH-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferdy Galema closed NUTCH-1026.
---
Resolution: Fixed
Fix Version/s: (was: 2.1)
nutchgora
When indexing a
[
https://issues.apache.org/jira/browse/NUTCH-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13272328#comment-13272328
]
Markus Jelsma commented on NUTCH-1026:
--
Great!
Strip UTF-8
[
https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13272341#comment-13272341
]
Lewis John McGibbney commented on NUTCH-1306:
-
This is exactly the viewpoint I
Hi all,
I found a solution to store metadata at outlinks.
The metadata is attached to crawldatum, so fetcher could read the
information stored there.
Solution is, to implement a custom score filter - method
distributeScoreToOutlinks.
In this method it is possible to do something like this,
[
https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferdy Galema updated NUTCH-1306:
Attachment: NUTCH-1306-v2.patch
NUTCH-1306-trunk.patch
Agree with trying to make
[
https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13272599#comment-13272599
]
Lewis John McGibbney commented on NUTCH-1306:
-
I've just stumbled across
[
https://issues.apache.org/jira/browse/NUTCH-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1077:
Fix Version/s: 2.1
Nutch 2 DbUpdateMapper throws ArrayOutOfBoundsException
[
https://issues.apache.org/jira/browse/NUTCH-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1357:
Affects Version/s: nutchgora
Fix Version/s: 2.1
All gora mapreduce
[
https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13272622#comment-13272622
]
Lewis John McGibbney commented on NUTCH-1363:
-
So just to summarize here... we
[
https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13272633#comment-13272633
]
Markus Jelsma commented on NUTCH-1363:
--
I'm fine with not having a -parse switch for
[
https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-1363.
-
Resolution: Not A Problem
Yeah, you guys win :0)
Closing as this is not an
[
https://issues.apache.org/jira/browse/NUTCH-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13272732#comment-13272732
]
Markus Jelsma commented on NUTCH-1363:
--
Good work anyway :) I had the same confusing
[
https://issues.apache.org/jira/browse/NUTCH-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273026#comment-13273026
]
Hudson commented on NUTCH-1358:
---
Integrated in Nutch-nutchgora #249 (See
[
https://issues.apache.org/jira/browse/NUTCH-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273027#comment-13273027
]
Hudson commented on NUTCH-1026:
---
Integrated in Nutch-nutchgora #249 (See
26 matches
Mail list logo