[
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364542 ]
Andrzej Bialecki commented on NUTCH-192:
-
I have two comments:
* it's not obvious to me what are the strong arguments in favor of storing
Writables. I'd think that
[
http://issues.apache.org/jira/browse/NUTCH-169?page=comments#action_12364544 ]
Andrzej Bialecki commented on NUTCH-169:
-
This patch looks good! If there are no further objections, I'll test it and
commit it within the next 12 hours.
remove
Has indexsorter code discussed a while back been
pushed to jira or put in SVN? I'd like to give it a
whirl on some of my indexes and the archive i can find
cut the post with the code attached..
Hi,
the last days I gave the mapred-branch a try and I was impressed!
But I still have a problem with the incremental crawling. My setup: I
have 4 boxes (1x namenode/jobtracker - 3x datanode/tasktracker). Running
one round of crawling consists out of the steps:
- generate (I set a limit of
move NDFS and MapReduce to a separate project
-
Key: NUTCH-193
URL: http://issues.apache.org/jira/browse/NUTCH-193
Project: Nutch
Type: Task
Components: ndfs
Versions: 0.8-dev
Reporter: Doug Cutting
[
http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12364662 ]
Andrzej Bialecki commented on NUTCH-193:
-
What timeframe did you have in mind? There are a few patches in the queue,
which will be affected by this split.
Other than
Andrzej Bialecki (JIRA) wrote:
[ http://issues.apache.org/jira/browse/NUTCH-169?page=comments#action_12364544 ]
Andrzej Bialecki commented on NUTCH-169:
-
This patch looks good! If there are no further objections, I'll test it and
commit it within
[
http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12364663 ]
Sami Siren commented on NUTCH-193:
--
+1
I quess the fuse-j - ndfs work from John/me could be part of hadoop /contrib
after this change?
move NDFS and MapReduce to a
Well, it was at least the best way we had seen, since NutchConfigured
require to implement a constructor that in most cases was unused as
well, since most classes are instantiated class.newInstance().
So both solutions was optimal, and we decide for the interface solution.
I'm pretty sure
Sami Siren wrote:
Andrzej Bialecki (JIRA) wrote:
[
http://issues.apache.org/jira/browse/NUTCH-169?page=comments#action_12364544
]
Andrzej Bialecki commented on NUTCH-169:
-
This patch looks good! If there are no further objections, I'll test
it
[
http://issues.apache.org/jira/browse/NUTCH-191?page=comments#action_12364678 ]
Doug Cutting commented on NUTCH-191:
We've thus far avoided loading job-specific code in the JobTracker and
TaskTracker, in order to keep these more reliable. File
[
http://issues.apache.org/jira/browse/NUTCH-44?page=comments#action_12364679 ]
Sami Siren commented on NUTCH-44:
-
Byron, have you made any progress with this?
too many search results
---
Key: NUTCH-44
URL:
[
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364683 ]
Stefan Groschupf commented on NUTCH-192:
Andrzej, Doug. I'm not sure if I understand you correct, do you suggest to have
string keys and values, or just string keys?
FYI
Original Message
Subject: NutchCVS/0.8-dev
Date: Mon, 30 Jan 2006 13:40:45 +0900 (JST)
From: [EMAIL PROTECTED]
Reply-To: nutch-agent@lucene.apache.org
To: nutch-agent@lucene.apache.org
Hi, I see that NutchCVS/0.8-dev is trying to crawl the
firecat.nihonsoft.org website,
Thanks for the clarification, i missed all this cross links!
You definitely 'are in the know'. :-)
Stefan
Am 31.01.2006 um 20:31 schrieb Doug Cutting:
Stefan Groschupf wrote:
The call CrawlDb.createJob(...) creates the crawl db update job.
In this method the main input folder is defined:
[
http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12364690 ]
Doug Cutting commented on NUTCH-193:
Otis: yes, thanks, I meant org.apache.hadoop.dfs.
Andrzej: I'm awaiting Mike's commit of NUTCH-183, which should happen today.
I'll
[
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364694 ]
Andrzej Bialecki commented on NUTCH-192:
-
What I meant was that both keys and values should be Strings (or rather UTF8),
for the sake of simplicity. Let's take your
[
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364699 ]
Stefan Groschupf commented on NUTCH-192:
* plus whatever it takes to put the class name-id mapping in the MapWritable
header (the mapping table): let's assume 40
[ http://issues.apache.org/jira/browse/NUTCH-194?page=all ]
Marko Bauhardt updated NUTCH-194:
-
Attachment: NutchConf.371869.patch
This patch fix the above described problems.
Nutch-169 introduced two tiny bugs
--
Andrzej Bialecki wrote:
I wonder, would it be a good idea to replace the (rather wasteful)
4-byte ints with Lucene's variable-byte int encoding, in all places
where size matters?
I'm not sure there are that many places where it could make a big
difference.
* UTF8 (2-byte string length)
[
http://issues.apache.org/jira/browse/NUTCH-191?page=comments#action_12364739 ]
Owen O'Malley commented on NUTCH-191:
-
Wouldn't it be appropriate to make input splitting into a task, so that
getSplits could be run by the TaskTrackerChild? That way the
RPC call times out while indexing map task is computing splits
--
Key: NUTCH-195
URL: http://issues.apache.org/jira/browse/NUTCH-195
Project: Nutch
Type: Bug
Components: indexer
Versions: 0.8-dev
[ http://issues.apache.org/jira/browse/NUTCH-192?page=all ]
Stefan Groschupf updated NUTCH-192:
---
Attachment: metadata310106.patch
Now 1 byte for the class type and the size of the type itself, this means we
can have only 2 byte keys and 2 byte values
Hi developers,
some people are already in the process of writing a web based
administration interface for nutch.
The goal is to get newbies faster and easier started with nutch.
I wrote our plans together so you can get an idea what we are working
on.
24 matches
Mail list logo