.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
shine some light on what happened to Fetcher2.java that
Dogacan refers to? I was only ever accustomed to OldFetcher and Fetcher :0)
Fetcher2 is the current Fetcher. The original Fetcher was temporarily
renamed OldFetcher and then removed.
--
Best regards,
Andrzej Bialecki
[
https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187927#comment-13187927
]
Andrzej Bialecki commented on NUTCH-1201:
--
I agree that there are situations
[
https://issues.apache.org/jira/browse/NUTCH-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186212#comment-13186212
]
Andrzej Bialecki commented on NUTCH-1247:
--
Indeed, line 264 increases the retry
[
https://issues.apache.org/jira/browse/NUTCH-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185908#comment-13185908
]
Andrzej Bialecki commented on NUTCH-1247:
--
Originally the reason for a byte
/
+ dependency org=junit name=junit rev=3.8.1 conf=*-default
/
dependency org=org.apache.hadoop name=hadoop-test
rev=0.20.205.0
conf=test-default /
--
Best regards,
Andrzej Bialecki
on the classpath and masks the issue? this happened to me once or
twice...
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http
the other ones are easy to convert, too...
I'm bogged with other work now, but I'll see if I can prepare an example
later today...
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
is in
org.apache.hadoop.mapreduce.lib.output .
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
org.apache.hadoop.mapred.MapFileOutputFormat still uses the
old api, and it's not deprecated yet.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com
drop in the 0.22 jars and see if it
compiles / tests are passing.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http
regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
[
https://issues.apache.org/jira/browse/NUTCH-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-1213.
--
Resolution: Fixed
Committed in rev. 1207217, thanks for the review
[
https://issues.apache.org/jira/browse/NUTCH-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-1213:
-
Attachment: NUTCH-1213.diff
Path that implements this functionality. SolrParams can
[
https://issues.apache.org/jira/browse/NUTCH-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13157077#comment-13157077
]
Andrzej Bialecki edited comment on NUTCH-1213 at 11/25/11 10:26 AM
On 23/11/2011 01:02, Andrzej Bialecki wrote:
On 22/11/2011 19:47, PJ Herring wrote:
Hey Chris,
Thanks for the response. I looked at the documents you sent me, and I
really do think incorporating some kind of DI Framework could be a great
addition to Nutch.
I have a general plan of attack
supposed to run ...
so at that time we didn't think this complication was justified.
If we can figure out something between full-blown OSGI and the current
system then that would be great.
--
Best regards,
Andrzej Bialecki
in CrawlDbReducer... Do you notice
any pattern to these pages? What's their origin?
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
porting of a pure ant
build to an ant+ivy build. We should determine what deps are really
needed by these plugins, and sanitize the ivy.xml files so that they
make sense - if the existing files can't be untangled we can ditch them
and come up with new, clean ones.
--
Best regards,
Andrzej
[
https://issues.apache.org/jira/browse/NUTCH-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147722#comment-13147722
]
Andrzej Bialecki commented on NUTCH-1139:
--
I suggest renaming the option
[
https://issues.apache.org/jira/browse/NUTCH-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147723#comment-13147723
]
Andrzej Bialecki commented on NUTCH-1061:
--
+1.
Migrate
,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
[
https://issues.apache.org/jira/browse/NUTCH-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144226#comment-13144226
]
Andrzej Bialecki commented on NUTCH-1196:
--
Very nicely done and useful patch
[
https://issues.apache.org/jira/browse/NUTCH-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-1195.
--
Resolution: Fixed
Committed in rev. 1197319.
Add Solr 4x (trunk
Components: indexer
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki
Fix For: 1.4
In some cases it's useful to be able to add to every document sent to Solr a
set of predefined fields with static values. This could be implemented on the
Solr
[
https://issues.apache.org/jira/browse/NUTCH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-1197:
-
Attachment: NUTCH-1197.patch
Patch with the implementation. I added some javadocs
Bialecki
Assignee: Andrzej Bialecki
Fix For: 1.4
The conf/schema.xml that we ship works ok for Solr 3.x, but in Solr trunk some
of the class names have been changed, and some field types have been redefined,
so if you simply drop this schema into Solr it will cause
[
https://issues.apache.org/jira/browse/NUTCH-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-1195:
-
Attachment: schema-solr4.xml
Add Solr 4x (trunk) example schema
[
https://issues.apache.org/jira/browse/NUTCH-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127427#comment-13127427
]
Andrzej Bialecki commented on NUTCH-1135:
--
A few comments from the author
[
https://issues.apache.org/jira/browse/NUTCH-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127470#comment-13127470
]
Andrzej Bialecki commented on NUTCH-1135:
--
bq. if you prefer to keep the old
[
https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125712#comment-13125712
]
Andrzej Bialecki commented on NUTCH-797:
-
That's unexpected :) I checked the patch
,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
[
https://issues.apache.org/jira/browse/NUTCH-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125916#comment-13125916
]
Andrzej Bialecki commented on NUTCH-1097:
--
+1, the latest patch looks good
[
https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125931#comment-13125931
]
Andrzej Bialecki commented on NUTCH-1142:
--
+1, the patch looks good
[
https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124737#comment-13124737
]
Andrzej Bialecki commented on NUTCH-797:
-
The fixup code in Tika is still
[
https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125016#comment-13125016
]
Andrzej Bialecki commented on NUTCH-797:
-
Uhh, sorry - I'll fix this in a moment
[
https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125077#comment-13125077
]
Andrzej Bialecki commented on NUTCH-797:
-
I'm puzzled by the algorithm
[
https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-797:
Attachment: NUTCH-797.patch
Tentative patch, which changes the meaning of fixEmbeddedParams
[
https://issues.apache.org/jira/browse/NUTCH-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125414#comment-13125414
]
Andrzej Bialecki commented on NUTCH-1097:
--
+1 the idea makes sense. Patch looks
[
https://issues.apache.org/jira/browse/NUTCH-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124428#comment-13124428
]
Andrzej Bialecki commented on NUTCH-1154:
--
TIKA-748 has been fixed
Upgrade to SolrJ 3.4.0
--
Key: NUTCH-1152
URL: https://issues.apache.org/jira/browse/NUTCH-1152
Project: Nutch
Issue Type: Improvement
Reporter: Andrzej Bialecki
Fix For: 1.4
Current release
[
https://issues.apache.org/jira/browse/NUTCH-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-1152.
--
Resolution: Fixed
Assignee: Andrzej Bialecki
Committed in rev. 1180087
Upgrade to Tika 0.10
Key: NUTCH-1154
URL: https://issues.apache.org/jira/browse/NUTCH-1154
Project: Nutch
Issue Type: Improvement
Components: parser
Affects Versions: 1.4
Reporter: Andrzej
[
https://issues.apache.org/jira/browse/NUTCH-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-1154:
-
Attachment: NUTCH-1154.diff
Patch to upgrade to Tika 0.10. Unfortunately, TestRTFParser
[
https://issues.apache.org/jira/browse/NUTCH-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13120982#comment-13120982
]
Andrzej Bialecki commented on NUTCH-1124:
--
Our implementation is most definitely
for a usable platform, and continue
redesign from that codebase.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com
[
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089405#comment-13089405
]
Andrzej Bialecki commented on NUTCH-1087:
--
IIRC we had this discussion
[
https://issues.apache.org/jira/browse/NUTCH-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067972#comment-13067972
]
Andrzej Bialecki commented on NUTCH-1014:
--
java.util.regex has the advantage
[
https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034724#comment-13034724
]
Andrzej Bialecki commented on NUTCH-985:
-
We should use the Solr's DateUtil in all
to specify what fields from
WebPage you are interested in (and only these fields will be pulled in
from the storage). This is all handled by StorageUtils methods.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information
improvements from 2.0 have been backported into 1.3 now
The only remaining issue to address before rolling out a 1.3 release is
NUTCH-914 Implement Apache Project Branding Requirements (and subtasks...)
--
Best regards,
Andrzej Bialecki
[
https://issues.apache.org/jira/browse/NUTCH-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-951.
-
Resolution: Fixed
Backport changes from 2.0 into 1.3
[
https://issues.apache.org/jira/browse/NUTCH-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004488#comment-13004488
]
Andrzej Bialecki commented on NUTCH-951:
-
* Ported NUTCH-872 in rev. 1079746
[
https://issues.apache.org/jira/browse/NUTCH-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-962.
-
Resolution: Fixed
Fix Version/s: 2.0
1.3
Assignee
[
https://issues.apache.org/jira/browse/NUTCH-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-955.
-
Resolution: Fixed
Fix Version/s: 2.0
Assignee: Andrzej Bialecki
Ivy
how to deploy Gora backend
implementations so that they work with Nutch and with a generic
unmodified Hadoop cluster.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
[
https://issues.apache.org/jira/browse/NUTCH-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-939.
-
Resolution: Fixed
Assignee: Andrzej Bialecki
I modified the patch slightly
Remove Lucene dependencies
--
Key: NUTCH-948
URL: https://issues.apache.org/jira/browse/NUTCH-948
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.3
Reporter: Andrzej Bialecki
[
https://issues.apache.org/jira/browse/NUTCH-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-948.
-
Resolution: Fixed
Committed in rev. 1051509.
Remove Lucene dependencies
[
https://issues.apache.org/jira/browse/NUTCH-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973915#action_12973915
]
Andrzej Bialecki commented on NUTCH-939:
-
1.2 release is out, and branch-1.2
bug.
- wait for Hadoop job completion in the Fetcher job
I missed your previous email... I'll fix this shortly - thanks for
spotting it.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic
that the field type declared in your schema.xml is not
multiValued. What was the exception?
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System
[
https://issues.apache.org/jira/browse/NUTCH-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12936047#action_12936047
]
Andrzej Bialecki commented on NUTCH-939:
-
Please note that trunk uses a very
[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-932:
Attachment: NUTCH-932-4.patch
Final version of the patch.
Bulk REST API to retrieve crawl
[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-932.
-
Resolution: Fixed
Fix Version/s: 2.0
Committed in rev. 1039014.
Bulk REST API
[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-932:
Attachment: NUTCH-932-3.patch
NutchTool is an abstract class in this patch. This actually
[
https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928909#action_12928909
]
Andrzej Bialecki commented on NUTCH-880:
-
Thanks - this issue is already fixed
[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-932:
Attachment: NUTCH-932.patch
This patch adds bulk retrieval of crawl results. This is still
[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-932:
Attachment: db.formatted.gz
Example DB content (this was passed through a JSON pretty
[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928355#action_12928355
]
Andrzej Bialecki commented on NUTCH-932:
-
Examples (with the db equivalent
[
https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-932:
Attachment: NUTCH-932.patch
Updated patch - this recognizes now URL parameters
[
https://issues.apache.org/jira/browse/NUTCH-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-931.
-
Resolution: Fixed
Committed in rev. 1028736 with some changes.
Simple admin API
[
https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-880:
Summary: REST API for Nutch (was: REST API (and webapp) for Nutch)
The webapp part
[
https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-880.
-
Resolution: Fixed
Fix Version/s: 2.0
Committed in rev. 1028235. The webapp part
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki
Nutch doesn't use Lucene API anymore, all indexing happens via Lucene-agnostic
SolrJ API. The only place where we still use a minor part of Lucene is in
index-basic, and that use (DateTools) can be easily replaced.
--
This message
[
https://issues.apache.org/jira/browse/NUTCH-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-930:
Attachment: NUTCH-930.patch
Patch to fix the issue. I'll commit this shortly.
Remove
[
https://issues.apache.org/jira/browse/NUTCH-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-930.
-
Resolution: Fixed
Fix Version/s: 2.0
Committed in rev. 1028474.
Remove remaining
: REST_api
Affects Versions: 2.0
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki
Fix For: 2.0
REST API needs a simple info / stats service and the ability to shutdown the
server.
--
This message is automatically generated by JIRA.
-
You can reply
[
https://issues.apache.org/jira/browse/NUTCH-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925543#action_12925543
]
Andrzej Bialecki commented on NUTCH-926:
-
bq. Nutch continues to crawl the WRONG
... but what's the point of using the
tool in our JIRA-based workflow? It looks to me like it duplicates at
least part of JIRA's functionality, and the remaining part is what we do
also in JIRA by convention...
--
Best regards,
Andrzej Bialecki
[
https://issues.apache.org/jira/browse/NUTCH-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924659#action_12924659
]
Andrzej Bialecki commented on NUTCH-913:
-
+1, let's commit it - I want to start
[
https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924154#action_12924154
]
Andrzej Bialecki commented on NUTCH-923:
-
This doesn't solve the problem
[
https://issues.apache.org/jira/browse/NUTCH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923845#action_12923845
]
Andrzej Bialecki commented on NUTCH-924:
-
The functionality is useful, +1
[
https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923896#action_12923896
]
Andrzej Bialecki commented on NUTCH-923:
-
This sounds useful, though
regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
[
https://issues.apache.org/jira/browse/NUTCH-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-921:
Attachment: NUTCH-921.patch
Patch that implements reading config parameters from
[
https://issues.apache.org/jira/browse/NUTCH-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920610#action_12920610
]
Andrzej Bialecki commented on NUTCH-913:
-
There are formatting issues
[
https://issues.apache.org/jira/browse/NUTCH-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916870#action_12916870
]
Andrzej Bialecki commented on NUTCH-907:
-
Hi Sertan,
Thanks for the patch
[
https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916874#action_12916874
]
Andrzej Bialecki commented on NUTCH-882:
-
Doğacan, I missed your previous comment
[
https://issues.apache.org/jira/browse/NUTCH-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916912#action_12916912
]
Andrzej Bialecki commented on NUTCH-864:
-
I think the difficulty comes from
tomorrow.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
+ indexing to Solr went just fine.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot
[
https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913118#action_12913118
]
Andrzej Bialecki commented on NUTCH-880:
-
bq. I think we can combine the approach
[
https://issues.apache.org/jira/browse/NUTCH-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912474#action_12912474
]
Andrzej Bialecki commented on NUTCH-909:
-
bq. It might be better to see the message
[
https://issues.apache.org/jira/browse/NUTCH-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki reassigned NUTCH-862:
---
Assignee: Andrzej Bialecki
HttpClient null pointer exception
[
https://issues.apache.org/jira/browse/NUTCH-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-906.
-
Fix Version/s: 1.2
Resolution: Fixed
Fixed in rev. 998261. Thanks!
Nutch
[
https://issues.apache.org/jira/browse/NUTCH-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910109#action_12910109
]
Andrzej Bialecki commented on NUTCH-907:
-
That's very good news - in that case I'm
[
https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-880:
Attachment: API.patch
Initial patch for discussion. This is a work in progress, so only
Issue Type: Bug
Reporter: Andrzej Bialecki
Fix For: 2.0
In Nutch 1.x it was possible to easily select a set of crawl data (crawldb,
page data, linkdb, etc) by specifying a path where the data was stored. This
enabled users to run several disjoint crawls
[
https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909757#action_12909757
]
Andrzej Bialecki commented on NUTCH-882:
-
+1 to NutchContext. See also NUTCH-907
1 - 100 of 162 matches
Mail list logo