Doug Cutting wrote:
Andrzej Bialecki wrote:
Using the original index, it was possible for pages with high tf/idf
of a term, but with a low "boost" value (the OPIC score), to outrank
pages with high "boost" but lower tf/idf of a term. This phenomenon
leads quite often
LuceneQueryOptimizer.LimitedCollector constructor, instead of
super(maxHits) it should be super(numHits) - this was actually the bug,
which was causing that mysterious slowdown for higher values of MAX_HITS.
--
Best regards,
Andrzej Bialecki
Rod Taylor wrote:
During a fetch I have recently started getting these (pretty
consistently).
Fixed. Thanks!
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Sem
should be fixed :) in the revision r365576. Please
report if it doesn't fix it for you.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix,
new version invokes Float.parseFloat() on line 88.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com
se-cases?
I would love to do this job, can I get a go from the other developers?
+1 from me.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix
Jérôme Charron wrote:
Excuse me in advance, I probably missed something, but what are the use
cases for having many NutchConf instances with different values?
Running many different tasks in parallel, each using different config,
inside the same JVM.
--
Best regards,
Andrzej Bialecki
e already sort of use with
CachingFilters, only they propose to store them on-disk instead of
limiting the cache to relatively small number of filters kept in RAM...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__
and used locally by
tasktrackers to instantiate local tasks using copies of the original
NutchConf instance.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \|
didn't see any problems, I think you can go ahead.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigra
?
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
g the content).
Is it easy to reproduce this if I knew the seed urls? If that's the
case, please send me the seed urls (contact me off the list, if it's
sensitive).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|
adding a CrawlDatum.policyId field would suffice, assuming we have a
means to store and retrieve these policies by ID; and then instantiate
it and call appropriate methods whenever we use today the URLFilters and
do the score calculations.
Any comments?
--
Best regards,
Andrzej Bialecki
the performance somehow, since we do not need to scan the plugin
folder and time.
Yes, I agree on both accounts. :-)
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval
ies too...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
operation... OTOH,
perhaps it's a premature micro-optimization. We can move it to metadata
for now, but I see it as a strong candidate to be moved back...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Informa
r https, cookies and authentication.
A related issue is that these two plugins replicate a lot of code. At
some point we should try to fix that. See:
http://www.nabble.com/protocol-http-versus-protocol-httpclient-t521282.html
Yes.
--
Best regar
?
Please do go on!
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
:
java.lang.ClassCastException: java.util.ArrayList
-Matt Zytaruk
Could you please add a call to printStackTrace() in that catch{}
statement, so that we know where the exception is thrown?
--
Best regards,
Andrzej Bialecki
bump the ParseData.VERSION, and leave
this code to handle older versions...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.si
processed differently if needs be.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Hi,
I attached the patch. Please test.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact
old segments.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
ave a
self-contained deployment package that you can simply copy around.
However, this does NOT by any means solve the problem of static
NutchConf, that problem is on the level of API usage and not the fi
.
Stacktrace?
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Gal Nitzan wrote:
Hi Andrzej,
The value cannot be null is my message :)
:)
I'm guessing that you are using Fetcher in non-parsing mode, and then
you run ParseSegment as a separate step, right?
Please try the attached patch.
--
Best regards,
Andrzej Bia
ns no segment name nor score in parseData.metadata.
Please test and report if it helps.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, Syst
arser will be added today or tomorrow.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info
to parseData.metadata.
I was waiting for someone to test it... but this could as well be you ;-)
Anyway to recover the crawl/finish the reduce job from
where it failed?
I don't think so... although it would be a nice feature.
--
Best regards,
Andrzej Bia
Mike Alulin wrote:
Is it possible to merge segments in the map reduce version of Nutch?
Not yet.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || |
not a good option as i have millions of documents and I DO know which of them were updated without requesting them.
This is a development version, nobody said it's feature complete.
Patience, my friend... or spend some effort to improve it. ;-)
--
Best regards,
Andrze
is a cost to modify the CrawlDB, but there is also a
cost to not be able to generate multiple different fetchlists and fetch
them in parallel...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retr
to process it... but overall these operations
scale much better in 0.8 than before.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
John X wrote:
Hi, Otis,
On Fri, Jan 20, 2006 at 09:31:16PM -0800, [EMAIL PROTECTED] wrote:
Hi John,
NDFS + MapReduce will soon become a separate Lucene sub-project.
In one sub-project or two separately?
In one. They are closely related anyway.
--
Best regards,
Andrzej Bialecki
ncountered some identification problems with some specific sites (with
blogger for instance), and I plan to investigate on this point.
* Another pending task : the analysis (and coding) of multilingual querying
support.
--
Best regards,
Andrze
[EMAIL PROTECTED] wrote:
I would like to decouple Lang Id from Nutch and move it in Lucene contrib/ in
the near future.
Does that sound ok?
+1 from me.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| In
it is considered as a spam).
How can I send the source code ?
Best regards.
Please use JIRA (http://issues.apache.org/jira/browse/NUTCH) - create a
new issue and attach the file.
--
Best regards,
Andrzej Bia
d always try to guess the language if we have enough text,
unless we can be sure that we deal with properly marked documents (not
such uncommon case in Intranets).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/
ty and general usefulness that this should be
coordinated with the existing efforts, and discussed on the mailing lists.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___||
meta data is found, then checks that it is the correct value regarding
the score of this language (statistical analyis).
If the score is too low or no meta data is found, then we perform a full
statistical analysis.
No?
Yes :-)
--
Best regards,
Andrze
.
Either way is fine with me. Perhaps splitting this into two commits
would make it easier to fix potential breakage...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
__
27;s better to avoid bash-isms,
if we easily can. Not all the world looks like Linux. ;-)
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix,
Doug Cutting wrote:
Andrzej Bialecki wrote:
Namely? I didn't notice any ... I think it's better to avoid
bash-isms, if we easily can. Not all the world looks like Linux. ;-)
IFS, at least. I tried running this on Solaris, where /bin/sh is not
bash, and it didn't work. It c
't matter where
it's installed.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Sami Siren wrote:
should there be a
conf.setObject(clazz,impl);
inside that try ?
Yes, of course, thanks for catching it!
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Sem
just that.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
have good
differentiation of values across page scores. Performance gains are
significant, in certain situations dramatic (e.g. 10x faster).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retr
Sami Siren wrote:
Andrzej Bialecki (JIRA) wrote:
[
http://issues.apache.org/jira/browse/NUTCH-169?page=comments#action_12364544
]
Andrzej Bialecki commented on NUTCH-169:
-
This patch looks good! If there are no further objections, I'll tes
4-byte ints for the size of list, e.g. ParseData.outlinks
Overall I think the size savings could be considerable, at the cost of
some CPU.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Inform
useful, I can add this
to PluginRepository.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
just the ones declared as
extensions in plugin.xml.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigra
Andrzej Bialecki wrote:
It works rather nicely. If other people find it useful, I can add this
to PluginRepository.
Committed.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Sem
maintenance line or both?
Most efforts go to the mapred version (in trunk/ now). If it's not much
work, or if there are compelling reasons, we try to update the
maintenance branches, but they are diverging more and more from the
[ http://issues.apache.org/jira/browse/NUTCH-198?page=all ]
Andrzej Bialecki closed NUTCH-198:
---
Resolution: Fixed
Added.
> SWF parser
> --
>
> Key: NUTCH-198
> URL: http://issues.apache.org/jira/b
[
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12365413 ]
Andrzej Bialecki commented on NUTCH-192:
-
I have a different opinion on this (I think MapWritable is a sufficiently
general-purpose data structure that would be
[
http://issues.apache.org/jira/browse/NUTCH-205?page=comments#action_12365434 ]
Andrzej Bialecki commented on NUTCH-205:
-
This is a design choice, not a bug. The errors you see are due to improper
configuration - some threads cannot access the
[
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12365536 ]
Andrzej Bialecki commented on NUTCH-192:
-
Yes, that's an issue - due to the way WritableName is initialized it's
difficult to add more mappings l
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365623 ]
Andrzej Bialecki commented on NUTCH-139:
-
I like this patch, the split of Metadata names into interfaces looks right. +1.
> Standard metadata property names in
[
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12365648 ]
Andrzej Bialecki commented on NUTCH-192:
-
Looks good to me, too. If there are no further objections, I can commit this
latest patch, modulo some minor whitespace
[
http://issues.apache.org/jira/browse/NUTCH-209?page=comments#action_12365782 ]
Andrzej Bialecki commented on NUTCH-209:
-
All Nutch classes + plugins weigh about 16MB. It feels a bit heavy to
distribute this to every node on every task request
[
http://issues.apache.org/jira/browse/NUTCH-209?page=comments#action_12365800 ]
Andrzej Bialecki commented on NUTCH-209:
-
No problem.
Re: plugin loading: well, when we are done building the binary distribution we
already know for sure what
[ http://issues.apache.org/jira/browse/NUTCH-192?page=all ]
Andrzej Bialecki closed NUTCH-192:
---
Resolution: Fixed
Applied. Thank you!
> meta data support for CrawlDatum
>
>
> Ke
[
http://issues.apache.org/jira/browse/NUTCH-198?page=comments#action_12366068 ]
Andrzej Bialecki commented on NUTCH-198:
-
This parser is already added in 0.8. You should be able to add it to 0.7.x with
little changes.
> SWF par
[ http://issues.apache.org/jira/browse/NUTCH-61?page=all ]
Andrzej Bialecki updated NUTCH-61:
---
Attachment: 20060227.txt
This patch is updated to the current trunk/ . The default configuration works
as before, and uses DefaultFetchSchedule.
If there
[
http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12368051 ]
Andrzej Bialecki commented on NUTCH-61:
I contemplated this for a while, and then decided against it.
The main reason was that currently most of the "plug
[
http://issues.apache.org/jira/browse/NUTCH-227?page=comments#action_12369660 ]
Andrzej Bialecki commented on NUTCH-227:
-
Isn't it so that QueryFilter (which is an interface) already extends
Configurable? What seems to be missi
[ http://issues.apache.org/jira/browse/NUTCH-229?page=all ]
Andrzej Bialecki closed NUTCH-229:
---
Resolution: Fixed
Applied. Thanks!
> improved handling of plugin folder configurat
[ http://issues.apache.org/jira/browse/NUTCH-206?page=all ]
Andrzej Bialecki closed NUTCH-206:
---
Fix Version: 0.8-dev
Resolution: Fixed
Fixed in r 384011.
> search server throws InstantiationExcept
[ http://issues.apache.org/jira/browse/NUTCH-203?page=all ]
Andrzej Bialecki closed NUTCH-203:
---
Fix Version: 0.8-dev
Resolution: Fixed
Fixed in r 376315. Thank you!
> ParseSegment throws InstantiationExcept
[ http://issues.apache.org/jira/browse/NUTCH-218?page=all ]
Andrzej Bialecki closed NUTCH-218:
---
Resolution: Fixed
Applied by Doug.
> need DOAP file for Nutch
>
>
> Key: NUTCH-218
>
[ http://issues.apache.org/jira/browse/NUTCH-3?page=all ]
Andrzej Bialecki closed NUTCH-3:
-
Resolution: Fixed
Fixed in r 376089.
> multi values of header discarded
>
>
> Key: NUTCH-3
>
[
http://issues.apache.org/jira/browse/NUTCH-230?page=comments#action_12370356 ]
Andrzej Bialecki commented on NUTCH-230:
-
Hmmm, this is a deeply philosophical question... Should you spread out the OPIC
score to all links that a page sports, or
[
http://issues.apache.org/jira/browse/NUTCH-230?page=comments#action_12370426 ]
Andrzej Bialecki commented on NUTCH-230:
-
Yes, these are good examples - I'll prepare a patch to make this a boolean
setting; if false (default) the calculation
[ http://issues.apache.org/jira/browse/NUTCH-230?page=all ]
Andrzej Bialecki updated NUTCH-230:
Attachment: patch.txt
Please review this patch, if it's ok I'll commit it.
> OPIC score for outlinks should be based on # of valid lin
Duplicate Inlink values
---
Key: NUTCH-235
URL: http://issues.apache.org/jira/browse/NUTCH-235
Project: Nutch
Type: Bug
Versions: 0.8-dev
Reporter: Andrzej Bialecki
Assigned to: Andrzej Bialecki
Reading the code for
[ http://issues.apache.org/jira/browse/NUTCH-235?page=all ]
Andrzej Bialecki updated NUTCH-235:
Attachment: patch.txt
Proposed fix for this issue. If there are no objections I'll commit this
shortly.
> Duplicate Inlin
[
http://issues.apache.org/jira/browse/NUTCH-235?page=comments#action_12371141 ]
Andrzej Bialecki commented on NUTCH-235:
-
No problem, I can change this. However, going through every link will then
require creation of an Iterator. We do this when
[ http://issues.apache.org/jira/browse/NUTCH-235?page=all ]
Andrzej Bialecki updated NUTCH-235:
Attachment: set-patch.txt
Same functionality, but using a HashSet.
> Duplicate Inlink values
> ---
>
> Ke
[ http://issues.apache.org/jira/browse/NUTCH-235?page=all ]
Andrzej Bialecki closed NUTCH-235:
---
Fix Version: 0.8-dev
Resolution: Fixed
HashSet-based version of the patch applied.
> Duplicate Inlink val
[ http://issues.apache.org/jira/browse/NUTCH-234?page=all ]
Andrzej Bialecki closed NUTCH-234:
---
Fix Version: 0.8-dev
Resolution: Fixed
Applied. Thanks!
> Clustering extension code cleanups and a real JUnit test case for the curr
[
http://issues.apache.org/jira/browse/NUTCH-237?page=comments#action_12371606 ]
Andrzej Bialecki commented on NUTCH-237:
-
Hmm, I'm not sure I like this patch. It removes support for other languages
than English. While I can agree wit
Bialecki
Assigned to: Andrzej Bialecki
Attachments: NDFSck.java
This is a utility to check health status of NDFS. NOTE: this is compatible ONLY
with pre-Hadoop Nutch versions! (Another version has been submitted for Hadoop
volumes).
--
This message is automatically generated by JIRA.
-
If you
[ http://issues.apache.org/jira/browse/NUTCH-238?page=all ]
Andrzej Bialecki updated NUTCH-238:
Attachment: NDFSck.java
> NDFSck - fsck utility for NDFS (pre-Hadoop)
> ---
>
> Ke
Reporter: Andrzej Bialecki
This patch refactors all places where Nutch manipulates page scores, into a
plugin-based API. Using this API it's possible to implement different scoring
algorithms. It is also much easier to understand how scoring works.
Multiple scoring plugins can be run in sequenc
[ http://issues.apache.org/jira/browse/NUTCH-240?page=all ]
Andrzej Bialecki updated NUTCH-240:
Attachment: patch.txt
> Scoring API: extension point, scoring filters and an OPIC plu
[
http://issues.apache.org/jira/browse/NUTCH-240?page=comments#action_12372379 ]
Andrzej Bialecki commented on NUTCH-240:
-
Yes, one of the reasons I wanted to discuss these patches is that they
uncovered some of the underlying ugliness... ;)
The
[
http://issues.apache.org/jira/browse/NUTCH-240?page=comments#action_12372580 ]
Andrzej Bialecki commented on NUTCH-240:
-
> First, I hope my critical remarks were not taken personally. I am thankful
> for this and all of your contributions.
[ http://issues.apache.org/jira/browse/NUTCH-240?page=all ]
Andrzej Bialecki updated NUTCH-240:
Attachment: Generator.patch.txt
This patch is an intermediate step towards the simplification of the scoring
API. It changes Generator to use an
[ http://issues.apache.org/jira/browse/NUTCH-238?page=all ]
Andrzej Bialecki closed NUTCH-238:
---
Resolution: Fixed
I'm closing this issue - DFSck has been committed to Hadoop, and anyone wishing
to use this version can get it here.
>
[ http://issues.apache.org/jira/browse/NUTCH-230?page=all ]
Andrzej Bialecki closed NUTCH-230:
---
Resolution: Fixed
Patch applied.
> OPIC score for outlinks should be based on # of valid links, not total # of
>
[ http://issues.apache.org/jira/browse/NUTCH-240?page=all ]
Andrzej Bialecki reassigned NUTCH-240:
---
Assign To: Andrzej Bialecki
> Scoring API: extension point, scoring filters and an OPIC plu
[ http://issues.apache.org/jira/browse/NUTCH-240?page=all ]
Andrzej Bialecki updated NUTCH-240:
Attachment: patch1.txt
Updated patch, includes the Generator.patch.txt. Changes:
* reduce creationf of new Objects in CrawlDbReducer
* simplify API by
[
http://issues.apache.org/jira/browse/NUTCH-240?page=comments#action_12373264 ]
Andrzej Bialecki commented on NUTCH-240:
-
Oops, sorry, that was a last moment change ... I fixed it now, thanks for
spotting this.
> Scoring API: extension po
[
http://issues.apache.org/jira/browse/NUTCH-244?page=comments#action_12373396 ]
Andrzej Bialecki commented on NUTCH-244:
-
We don't pass the Configuration object to the constructor, so we have no way to
read the value of this. Configuration i
[ http://issues.apache.org/jira/browse/NUTCH-240?page=all ]
Andrzej Bialecki updated NUTCH-240:
Attachment: patch2.txt
Minor refactoring: passScore* methods now allow access to more data. I found
this useful when implementing a different scoring
[ http://issues.apache.org/jira/browse/NUTCH-254?page=all ]
Andrzej Bialecki closed NUTCH-254:
---
Resolution: Fixed
Fixed - actually there were two places which needed fixing, I also somewhat
simplified the logic Thank you!
> Fetcher thr
[ http://issues.apache.org/jira/browse/NUTCH-125?page=all ]
Andrzej Bialecki closed NUTCH-125:
---
Fix Version: 0.8-dev
Resolution: Fixed
Applied, with some changes (due to Nutch API changes, and also it uses lib-xml
plugin now
[
http://issues.apache.org/jira/browse/NUTCH-240?page=comments#action_12377200 ]
Andrzej Bialecki commented on NUTCH-240:
-
If there are no further suggestions or objections, I'd like to move forward on
this patch. I know the passScore* method
MapWritable.equals() doesn't work properly
--
Key: NUTCH-263
URL: http://issues.apache.org/jira/browse/NUTCH-263
Project: Nutch
Type: Bug
Versions: 0.8-dev
Reporter: Andrzej Bialecki
MapWritable.equals
[ http://issues.apache.org/jira/browse/NUTCH-263?page=all ]
Andrzej Bialecki updated NUTCH-263:
Attachment: patch1.txt
This patch fixes the issue, but at the cost of creating new objects...
improvements are welcome.
> MapWritable.equals() does
501 - 600 of 1327 matches
Mail list logo