[
https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514914
]
Andrzej Bialecki commented on NUTCH-525:
-
+1 for adding undeleteAll(). When DDRecordReader was created
[
https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513853
]
Andrzej Bialecki commented on NUTCH-518:
-
IMHO this change is not helpful. It takes away too much control
trigger a costly copy operation.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
[
https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki reopened NUTCH-518:
-
This one was too quick, I think ... I wanted to discuss the issue whether the
chaining
[
https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513704
]
Andrzej Bialecki commented on NUTCH-518:
-
Right, I was too quick too ... ;) Leave it in for now. Let's agree
[
https://issues.apache.org/jira/browse/NUTCH-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513019
]
Andrzej Bialecki commented on NUTCH-515:
-
+1 - sorry for the mess up ...
Next fetch time is set
[
https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512139
]
Andrzej Bialecki commented on NUTCH-505:
-
Please test Java 1.5 and Java 1.6 - IIRC there are some
[
https://issues.apache.org/jira/browse/NUTCH-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki closed NUTCH-511.
---
Resolution: Invalid
Assignee: Andrzej Bialecki
Please use mailing lists
[
https://issues.apache.org/jira/browse/NUTCH-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki closed NUTCH-512.
---
Resolution: Invalid
Please use mailing lists for such questions.
Search on date range
process to consider the
complete webgraph, i.e. all link information collected so far - but the
main attractiveness of OPIC is that it's incremental, so that you don't
have to consider the whole webgraph with small incremental updates.
--
Best regards,
Andrzej Bialecki
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511362
]
Andrzej Bialecki commented on NUTCH-439:
-
Very nice patch! A couple comments:
* the fix
[
https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511447
]
Andrzej Bialecki commented on NUTCH-505:
-
* In ParseOutputFormat, the calculation of outlinksToStore should
/nutch/FixingOpicScoring bottom of the page).
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info
in Injector.InjectReducer.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Doug Cutting wrote:
Will the next release really be 1.0 or will it be 0.10?
Really 1.0.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System
not be as full of features as one might wished, but that's what the 1.0
release implies - it's usable, with some limitations.
Will it be soon? :)
I'm pretty sure it will be some time after the vacation period is over,
not earlier ;)
--
Best regards,
Andrzej Bialecki
(that is, supported by developer
resources ;) ) to be able to do this.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508816
]
Andrzej Bialecki commented on NUTCH-392:
-
Re: Content versioning - we can use negative int values as version
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508900
]
Andrzej Bialecki commented on NUTCH-392:
-
Excellent work, Doğacan - thank you. The numbers for RECORD
[
https://issues.apache.org/jira/browse/NUTCH-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508506
]
Andrzej Bialecki commented on NUTCH-498:
-
+1.
Use Combiner in LinkDb to increase speed of linkdb
[
https://issues.apache.org/jira/browse/NUTCH-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507609
]
Andrzej Bialecki commented on NUTCH-501:
-
+1 - looks good. An idea: perhaps we could add a LOG.debug
[
https://issues.apache.org/jira/browse/NUTCH-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507168
]
Andrzej Bialecki commented on NUTCH-504:
-
+1 - we should skip documents that failed to parse properly
[
https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506775
]
Andrzej Bialecki commented on NUTCH-497:
-
The patch looks good to me as it is now - however, I've seen
for more details.
This change will affect a lot of places in our code, so it would be best
to do it long before the next Nutch release.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
.=
jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505807 ]=20
Andrzej Bialecki commented on NUTCH-501:
-
ObjectCache should support caching objects that fall under the same key, bu=
t are differently configured. This situation occurs when running in local
[
https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505598
]
Andrzej Bialecki commented on NUTCH-485:
-
Whitespace changes should be committed as a separate patch
[
https://issues.apache.org/jira/browse/NUTCH-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505302
]
Andrzej Bialecki commented on NUTCH-498:
-
Currently there is no difference, indeed. The version
Hi all,
I'm glad to announce that the Lucene PMC has voted to add Doğacan Güney
as Nutch committer.
Welcome, Doğacan! There are 192 open issues in Nutch JIRA waiting to be
solved ... just dive in! ;)
--
Best regards,
Andrzej Bialecki
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500951
]
Andrzej Bialecki commented on NUTCH-392:
-
I don't think it's a good idea, it's creating too many cryptic
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500635
]
Andrzej Bialecki commented on NUTCH-392:
-
Good point. We can change it to use the following pattern
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500728
]
Andrzej Bialecki commented on NUTCH-392:
-
I think it is okay to allow BLOCK compression for linkdb, crawldb
,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Briggs wrote:
Oh, you want me to change the getSorted method to be synchronized?
I'll put a lock in there and see what happens, if that is what you are
referring to.
Yes, please try this change.
--
Best regards,
Andrzej Bialecki
, in my opinion it should not be applied.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info
rubdabadub wrote:
MANY MANY Super Thanks! I can't thank you enough for this Patch :-)
This is so cool!!!
You're welcome :) I would appreciate it if you could give it some
testing and provide feedback ...
--
Best regards,
Andrzej Bialecki
[
https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-466:
Attachment: segmentparts.patch
This patch contains the following modifications
[
https://issues.apache.org/jira/browse/NUTCH-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-486.
-
Resolution: Won't Fix
Fix Version/s: 1.0.0
Assignee: Andrzej Bialecki
[
https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-466:
Attachment: ParseFilters.java
Add missing file.
Flexible segment format
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-392.
-
Resolution: Fixed
Fix Version/s: 1.0.0
Assignee: Andrzej Bialecki
).
In other words, it seems to me that there is no such situation in which
we have to reload plugins within the same JVM, but with different
parameters.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval
[
https://issues.apache.org/jira/browse/NUTCH-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-61.
Resolution: Fixed
Fix Version/s: 1.0.0
Applied with some modifications in rev. 542903
[
https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-466 started by Andrzej Bialecki .
Flexible segment format
---
Key: NUTCH-466
URL: https
need to use HtmlParseFilter.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
rubdabadub wrote:
On 3/22/07, Andrzej Bialecki [EMAIL PROTECTED] wrote:
rubdabadub wrote:
Hi:
Just wondering about NUTCH-61
http://issues.apache.org/jira/browse/Nutch-61
Will it make the 0.9 cut?
It would be nice if it did. Its probably too late.
This was discussed before
[
https://issues.apache.org/jira/browse/NUTCH-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495867
]
Andrzej Bialecki commented on NUTCH-486:
-
-1
This is a side-effect of using LinkDbReader to read the LinkDb
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-443:
Attachment: patch.txt
I'm not too happy with the direction you took in the latest patch
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495797
]
Andrzej Bialecki commented on NUTCH-443:
-
Indeed... I forgot that we need crawl_parse to collect new sub
[
https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495319
]
Andrzej Bialecki commented on NUTCH-485:
-
I think a more natural change would be this:
ParseResult filter
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-443.
-
Resolution: Fixed
Committed in rev. 536606. Big thanks to all who contributed
[
https://issues.apache.org/jira/browse/NUTCH-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-467.
-
Resolution: Fixed
Assignee: Andrzej Bialecki
Patch applied in rev. 532105
[
https://issues.apache.org/jira/browse/NUTCH-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki closed NUTCH-418.
---
Resolution: Fixed
Fix Version/s: 0.9.0
Already applied.
Fixes parsing of XHTML (e.g
[
https://issues.apache.org/jira/browse/NUTCH-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki closed NUTCH-417.
---
Resolution: Fixed
Fix Version/s: 0.9.0
Assignee: Andrzej Bialecki
Fixed
[
https://issues.apache.org/jira/browse/NUTCH-393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494552
]
Andrzej Bialecki commented on NUTCH-393:
-
I agree with that - either all filters should run or the document
?
Indeed. Thanks for spotting this - it's fixed.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info
[
https://issues.apache.org/jira/browse/NUTCH-393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved NUTCH-393.
-
Resolution: Fixed
Fix Version/s: 1.0.0
Assignee: Andrzej Bialecki
Both
[
https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494582
]
Andrzej Bialecki commented on NUTCH-479:
-
Correct - the only syntax element added in this patch
: Andrzej Bialecki
Assigned To: Andrzej Bialecki
Fix For: 1.0.0
There have been many requests from users to extend Nutch query syntax to add
support for OR queries, in addition to the implicit AND and NOT queries
supported now.
--
This message is automatically generated by JIRA
[
https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-479:
Attachment: or.patch
Patch based on the discussion on the mailing list, and a description
: 1.0.0
Reporter: Andrzej Bialecki
Assigned To: Andrzej Bialecki
Fix For: 1.0.0
I propose to make the following changes to URLFilters:
* extend URLFilters so that they support different filtering rules depending on
the context where they are executed
[
https://issues.apache.org/jira/browse/NUTCH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-477:
Attachment: urlfilters.patch
This patch implements suggested changes.
Extend URLFilters
[
https://issues.apache.org/jira/browse/NUTCH-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492386
]
Andrzej Bialecki commented on NUTCH-468:
-
+1. I'm writing a scoring plugin now where it's impossible
?
See the ASCII-art graphs and comments in NUTCH-385 - this is likely not
what is expected.
Although this JIRA issue is still open, the Fetcher2 code tries to
implement this middle ground solution.
--
Best regards,
Andrzej Bialecki
the
protocol/http difference) to false to indicate lib-http shouldn't
handle blocking internally. Because of this, when you use Fetcher2,
lib-http still tries to block them which makes Fetcher2 much less
useful.
This is definitely a bug.
--
Best regards,
Andrzej Bialecki
[
https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491290
]
Andrzej Bialecki commented on NUTCH-471:
-
+1. Nice trick with the unsynchronized check. :)
Fix
, you're right - it's a bug. However, the reasoning that I presented
still holds, it's just the implementation that doesn't get it ;)
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
[
https://issues.apache.org/jira/browse/NUTCH-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki closed NUTCH-474.
---
Resolution: Fixed
Assignee: Andrzej Bialecki
Fixed in rev. 532088. Thanks!
Fetcher2
be a good option to have,
especially for smaller setups - but it would require extensive
modifications to many tools in Nutch. Unless you are willing to provide
patches that implement it without breaking the large-scale case, I think
we should let the matter rest ...
--
Best regards,
Andrzej
in the DB is used
not standalone, but as one of many inputs to a map-reduce job.
To summarize - I think it would be very difficult to do this with the
current codebase.
--
Best regards,
Andrzej Bialecki
this issue. :)
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
should move forward.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Index
Chris Mattmann wrote:
[..]
[ ] +1 Release the packages as Apache Nutch 0.9
[ ] -1 Do not release the packages because...
+1.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
and discovering new issues, and patching them, we
will never make a release ... I think for issues that are not critical
or blocker we should press forward, otherwise we will have to wait
another 72 hours, and another, and another ...
--
Best regards,
Andrzej Bialecki
[
https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485986
]
Andrzej Bialecki commented on NUTCH-466:
-
Minor nit: MapFile requires that the key is a WritableComparable
[
https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486003
]
Andrzej Bialecki commented on NUTCH-466:
-
I thought that the map will be from class names to directory
: Andrzej Bialecki
Assigned To: Andrzej Bialecki
In many situations it is necessary to store more data associated with pages
than it's possible now with the current segment format. Quite often it's a
binary data. There are two common workarounds for this: one is to use per-page
metadata
(in a separate thread) before
rewriting the how to release
page in wiki.
I agree - the current release process didn't fare too well in this
particular situation ...
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information
withhold other development while waiting for rc1,
rc2, rcN, ... - other patches, including disruptive ones and those that
introduce new features, can be applied in the meantime to trunk/ .
As for bugfixes, they can be merged up or down between the branch
and trunk as needed.
--
Best regards,
Andrzej
Sami Siren wrote:
2007/3/29, Andrzej Bialecki [EMAIL PROTECTED]:
Sami Siren wrote:
IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have
0.9-rc2 tag and so on until we are satisfied.
Then when we're actually satisfied create tag for 0.9 (copy from rc
that got promoted).
What
Sami Siren wrote:
2007/3/29, Andrzej Bialecki [EMAIL PROTECTED]:
Sami Siren wrote:
2007/3/29, Andrzej Bialecki [EMAIL PROTECTED]:
Sami Siren wrote:
IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have
0.9-rc2 tag and so on until we are satisfied.
Then when we're
. CrawlDbReader knows about Nutch naming
convention and always appends current to the db name. But if you were
to use MapFileOutputFormat.getReaders() directly this Hadoop class of
course doesn't know about this, so you need to provide a full path that
includes current.
--
Best regards,
Andrzej Bialecki
are finally happy
with the codebase then take a snapshot into tags/release-0.9, and keep
it read-only.
Another solution is to bend the rules and apply the patch to trunk/ and
then merge from the trunk to tags/release-0.9 .
What do you think?
--
Best regards,
Andrzej Bialecki
between 0-9.
What do you think?
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot
step.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
in
the process. Let me know what you all think.
I think we should work together on a proposed API changes to this
extensible part interface, plus probably some changes to the Parse
API. I can create a JIRA issue and provide some initial patches.
--
Best regards,
Andrzej Bialecki
-size.html
Yes, I saw this - great stuff :)
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram
release ever! :)
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
[
https://issues.apache.org/jira/browse/NUTCH-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki closed NUTCH-246.
---
Resolution: Fixed
Assignee: Andrzej Bialecki
Thanks for reminding us about
Sami Siren wrote:
for me it works:
...
BUILD SUCCESSFUL
Total time: 4 minutes 3 seconds
I did a fresh checkout to an empty dir, rebuilt and it's still failing -
perhaps you have some uncommitted changes in your working copy ... ?
--
Best regards,
Andrzej Bialecki
rubdabadub wrote:
Hi:
Just wondering about NUTCH-61
http://issues.apache.org/jira/browse/Nutch-61
Will it make the 0.9 cut?
It would be nice if it did. Its probably too late.
This was discussed before - it will be applied right after the release.
--
Best regards,
Andrzej Bialecki
is the reason - it seems that the
results of text extraction are completely different under 1.6 ...
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System
[
https://issues.apache.org/jira/browse/NUTCH-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482332
]
Andrzej Bialecki commented on NUTCH-462:
-
Is this happening with the latest trunk? See NUTCH-167, which
Sami Siren wrote:
Andrzej Bialecki wrote:
Hi all,
I just committed Hadoop 0.12.1. Let's double-check that it works ok.
Here's the list of Critical/Blocker issues I mentioned before, and their
current status:
Any other stuff we need to fix before the release?
I am satisfied except the broken
is in the classpath. I think that
What needs to be on your classpath is the *.job jar. The bin/nutch
script takes care of that if you built your Nutch using the command-line
version of ant.
--
Best regards,
Andrzej Bialecki
[
https://issues.apache.org/jira/browse/NUTCH-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482266
]
Andrzej Bialecki commented on NUTCH-381:
-
Your last comment confirms my suspicions. After analysis
[
https://issues.apache.org/jira/browse/NUTCH-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki closed NUTCH-381.
---
Resolution: Won't Fix
Fix Version/s: 0.9.0
Assignee: Andrzej Bialecki
[
https://issues.apache.org/jira/browse/NUTCH-459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki closed NUTCH-459.
---
Resolution: Fixed
Upgraded to 0.12.1 release.
Upgrade Nutch to Hadoop 0.12.1
[
https://issues.apache.org/jira/browse/NUTCH-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-353:
Priority: Major (was: Blocker)
This i partially fixed so that page status is consistent
[
https://issues.apache.org/jira/browse/NUTCH-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-451:
Priority: Minor (was: Major)
Tool to recover partial fetcher output
[
https://issues.apache.org/jira/browse/NUTCH-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki closed NUTCH-450.
---
Resolution: Invalid
Assignee: Andrzej Bialecki
This belongs in nutch-user mailing list
-427 Moved to Major, fix after release.
NUTCH-381 Won't fix - this is a configuration issue.
NUTCH-277 Cannot reproduce
NUTCH-167 Fixed.
Any other stuff we need to fix before the release?
--
Best regards,
Andrzej Bialecki
501 - 600 of 1046 matches
Mail list logo