Will the next release really be 1.0 or will it be 0.10?
Doug
Briggs wrote:
> I was just curious to know if there were any plans to release a
> maintenence/bug-fix release before 1.0. I know there have been a slew
> of patches and such (it's almost impossible to keep up, unless someone
> has a su
The problem is that nutch-dev (like most Apache mailing lists) sets the
"Reply-to" header to be itself, so that responses don't go back to the
sender. If you override this when responding (changing the "To:" line)
and respond to the sender, then it should end up as a comment, which
will be the
[
https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507473
]
Doug Cutting commented on NUTCH-479:
Neither. It would end up as the Lucene query:
+"search p
Does the 0.9 crawl-delay implementation actually permit multiple threads
to access a site simultaneously?
Doug
Original Message
Subject: Nutch 0.9 and Crawl-Delay
Date: Sun, 3 Jun 2007 10:50:24 +0200
From: Lutz Zetzsche <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
To: [EMAIL
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500822
]
Doug Cutting commented on NUTCH-392:
Anchors, explain, and the cache are used relatively infrequently
Personnel discussions are conducted on the PMC's private mailing list.
I have forwarded your message there.
Thanks for the suggestion!
Doug
Gal Nitzan wrote:
> Hi,
>
> Since I'm no committer I can't really "propose" :-) but I just thought to
> draw
> some attention to the great work done on
karthik085 wrote:
> How do you find when a revision was released?
Look at the tags in subversion:
http://svn.apache.org/viewvc/lucene/nutch/tags/
Doug
-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C -
Tom White wrote:
> I will be there too.
Unfortunately I won't be able to attend after all. The new baby in the
house won't let me!
Doug
-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE ver
Arun Kaundal wrote:
> Actually nutch people are kind of autocrate., don't expect more from them
> They do what they have decided
Have you submitted patches that have been ignored or rejected?
Each Nutch contributor indeed does what he or she decides. Nutch is not
a service organization that
Steve Severance wrote:
> I am not looking to really make an image retrieval engine. During indexing
> referencing docs will be analyzed and text content will be associated with
> the image. Currently I want to keep this in a separate index. So despite the
> fact that images will be returned the
[EMAIL PROTECTED] wrote:
[ ... ]
> -/**
> - * Licensed to the Apache Software Foundation (ASF) under one or more
> - * contributor license agreements. See the NOTICE file distributed with
[ ... ]
> +/**
> + * Licensed to the Apache Software Foundation (ASF) under one or more
> + * contributor lice
I will probably be there.
Doug
Marc Boucher wrote:
> I was wondering if anyone is going to ApacheCon
> (http://www.eu.apachecon.com)
> in May as they have a full day's workshop on Lucene and will other sessions
> on Nutch, Hadoop and Solr?
>
> Marc Boucher
-
[
https://issues.apache.org/jira/browse/NUTCH-455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478854
]
Doug Cutting commented on NUTCH-455:
Alternately, we could define it as an error to attempt to dedup by a
Chris Mattmann wrote:
> It's too bad that
> this has turned out to be an issue that I've handled incorrectly, and for
> that, I apologize.
Sorry if I blew this out of proportion. We all help each other run this
project. I don't think any grave error was made. I just saw an
opportunity to remi
Sami Siren wrote:
> It would be more beneficial to everybody if the discussions (related to
> release or Nutch) is
> done on public (hey this is open source!). The off the list stuff IMO
> smells.
+1 Folks sometimes wish to discuss project matters off-list to spare
others the boring details, but
Zaheed Haque wrote:
> Its been about a month I been trying to find time to make the
> necessary changes so that I could submit the code. Due to enormous
> amount of work load I am unable to find the time. I am not sure how
> should I proceed, I have personally try to contact some of you off
> list.
[
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476665
]
Doug Cutting commented on NUTCH-445:
Setting the boost to non-zero permits a "site:" query with no o
[
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476243
]
Doug Cutting commented on NUTCH-445:
Note that the "site" field is also used for search-time deduplic
Nutch's nightly builds have been moved to a Hudson server at:
http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/
I've stopped the old nightly build process and added a redirect from the
old nightly build distribution directory to this page.
Thanks to Nigel Daley for configuring an
[
https://issues.apache.org/jira/browse/NUTCH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doug Cutting reassigned NUTCH-449:
--
Assignee: Doug Cutting
> Format of junit output should be configura
Andrzej Bialecki wrote:
> The degree of simplification is very substantial. Our NutchSuperQuery
> doesn't have to do much more work than a simple TermQuery, so we can
> assume that the cost to run it is the same as TermQuery times some
> constant. What we gain then is the cost of not running all
Doug Cutting (JIRA) wrote:
>> this patch in some places removes the log guards
>
> Most of the log guards are misguided. Log guards should only be used on
> DEBUG level messages in performance-critical inner loops. Since INFO is the
> expected log level, a guard on INFO &am
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472821
]
Doug Cutting commented on NUTCH-443:
> this patch in some places removes the log guards
Most of the log gua
Chris Mattmann wrote:
> Got it. So, the logic behind this is, why bother waiting until the
> following fetch to parse (and create ParseData objects from) the RSS items
> out of the feed. Okay, I get it, assuming that the RSS feed has *all* of the
> RSS metadata in it. However, it's perfectly accep
Chris Mattmann wrote:
> Sorry to be so thick-headed, but could someone explain to me in really
> simple language what this change is requesting that is different from the
> current Nutch API? I still don't get it, sorry...
A Content would no longer generate a single Parse. Instead, a Content
co
Renaud Richardet wrote:
> I see. I was thinking that I could index the feed items without having
> to fetch them individually.
Okay, so if Parser#parse returned a Map, then the URL for
each parse should be that of its link, since you don't want to fetch
that separately. Right?
So now the ques
Renaud Richardet wrote:
> The usecase is that you index RSS-feeds, but your users can search each
> feed-entry as a single document. Does it makes sense?
But each feed item also contains a link whose content will be indexed
and that's generally a superset of the item. So should there be two
ur
Doğacan Güney wrote:
> OK, then should I go forward with this and implement something? This
> should be pretty easy,
> though I am not sure what to give as keys to a Parse[].
>
> I mean, when getParse returned a single Parse, ParseSegment output them
> as . But, if getParse
> returns an array, w
Doğacan Güney wrote:
> I think it would make much more sense to change parse plugins to take
> content and return Parse[] instead of Parse.
You're right. That does make more sense.
Doug
-
Using Tomcat but need to do more?
Gal Nitzan wrote:
> IMHO the data that is needed i.e. the data that will be fetched in the next
> fetch process is already available in the element. Each element
> represents one web resource. And there is no reason to go to the server and
> re-fetch that resource.
Perhaps ProtocolOutput shou
Dennis Kubes wrote:
> Andrzej Bialecki wrote:
>> I believe that at this point it's crucial to keep the project
>> well-focused (at the moment I think the main focus is on larger
>> installations, and not the small ones), and also to make Nutch
>> attractive to developers as a reusable "search en
Chris Mattmann wrote:
> So, does this render the patch that I wrote obsolete?
It's at least out-of-date and perhaps obsolete. A quick read of
Fetcher.java looks like there might be a case where a "fatal" error is
logged but the fetcher doesn't exit, in FetcherThread#output().
Doug
--
Scott Ganyo (JIRA) wrote:
> ... since Hadoop hijacks and reassigns all log formatters (also a bad
> practice!) in the org.apache.hadoop.util.LogFormatter static constructor ...
FYI, Hadoop no longer does this.
Doug
-
Take
Teruhiko Kurosaka wrote:
> I suggest "i18n" be renamed to "l10n", short for
> localization.
Can you please file an issue in Jira for this? Ideally you could even
provide a patch. The source for the website is in subversion at:
http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/site
Forres
[EMAIL PROTECTED] wrote:
> Draft version of "How to Become a Nutch Developer" is on the wiki at:
>
> http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer
>
> Please take a look and if you think anything needs to be added, removed,
> or changed let me know.
Thanks for taking the time to write
Dennis Kubes wrote:
> Can you answer the question of how to add developer names to JIRA or if
> that is only for committers?
It's not just for committers, but also for regular contributors. I have
added you. Anyone else?
Doug
--
[EMAIL PROTECTED] wrote:
> Yes, certainly, anything that can be shared and decoupled from pieces that
> make each branch (not SVN/CVS branch) different, should be decoupled. But I
> was really curious about whether people think this is a valid idea/direction,
> not necessarily immediately how t
Andrzej Bialecki wrote:
> The workflow is different - I'm not sure about the details, perhaps Doug
> can correct me if I'm wrong ... and yes, it uses JIRA extensively.
>
> 1. An issue is created
> 2. patches are added, removed commented, etc...
> 3. finally, a candidate patch is selected, and the
Dennis Kubes wrote:
> I will say that it is difficult for people to understand how to get more
> involved. I have been working with Nutch and Hadoop for almost a year
> now on a daily basis and only now am I understanding how to contribute
> through jira, etc. There needs to be more guidance i
Stefan Groschupf wrote:
> I don't want to start a emotional discussion here, however talking about
> the problem in public might help.
What, specifically, is the problem you perceive?
Doug
-
Take Surveys. Earn Cash. Influen
Stefan Groschupf wrote:
> We run the gui in several production environemnts with patched hadoop
> code - since this is from our point of view the clean approach.
> Everything else feels like a workaround to fix some strange hadoop
> behaviors.
Are there issues in Hadoop's Jira for these? If so
Andrzej Bialecki wrote:
> The reason is that if you pack this file into your job JAR, the job jar
> would become very large (presumably this 40MB is already compressed?).
> Job jar needs to be copied to each tasktracker for each task, so you
> will experience performance hit just because of the
Sami Siren wrote:
> Stefan Groschupf wrote:
>> See:
>> http://www.find23.net/nutch_guiToHadoop.pdf
>> Section required hadoop changes.
>
> I quess you refer to these:
>
> • LocalJobRunner:
> • Run as kind of singelton
> • Have a kind of jobQueue
> • Implement JobSubmissionProtocol statu
The wiki would be a good place for this.
Doug
Peter Landolt wrote:
> Hello,
>
> We tried to introduce Nutch at a telecommunication company in Switzerland
> as search engine of their future main search solution. As they were also
> proofing
> commercial products we needed to offer them a brochur
Sami Siren wrote:
> looks like somebody just enabled email-to-jira-comments-feature. I was
> just wondering would it be good to use this feature more widely.
I think it would be good. That way mailing list discussion would be
logged to the bug as well.
> This could be achieved by removing the
[
http://issues.apache.org/jira/browse/NUTCH-385?page=comments#action_12441552 ]
Doug Cutting commented on NUTCH-385:
> It would be one thing if whenever (fetcher.threads.per.host > 1), this
> trumped the server delay [...]
Are
[
http://issues.apache.org/jira/browse/NUTCH-353?page=comments#action_12439682 ]
Doug Cutting commented on NUTCH-353:
It's worth noting that Google, Yahoo! and Microsoft's searches all return lots
of links to www-XXX.ibm.com.
[ http://issues.apache.org/jira/browse/NUTCH-304?page=all ]
Doug Cutting resolved NUTCH-304.
Resolution: Fixed
I just fixed this. Thanks for noticing!
> Change JIRA email address for nutch issues from apache incuba
[
http://issues.apache.org/jira/browse/NUTCH-368?page=comments#action_12435539 ]
Doug Cutting commented on NUTCH-368:
How would you compare this to JMS?
http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/jms/package-summary.html
Is it
Chris Mattmann wrote:
> +1. I think that workflow makes a lot of sense. Currently users in the
> nutch-developers group can close and resolve issues. In the Hadoop workflow,
> would this continue to be the case?
In Hadoop, most developers can resolve but not close. Only members of a
separate J
Sami Siren wrote:
> I am not able to do it either, or then I just don't know how, can Doug
> help us here?
This requires a change the the project's workflow. I'd be happy to move
Nutch to use the workflow we use for Hadoop, which supports "Patch
Available".
This workflow has one other non-def
Sami Siren wrote:
> Patch works for me.
OK. I just committed it.
Thanks!
Doug
-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your
Jérôme Charron wrote:
In my environment, the crawl command terminate with the following error:
2006-07-06 17:41:49,735 ERROR mapred.JobClient
(JobClient.java:submitJob(273))
- Input directory /localpath/crawl/crawldb/current in local is invalid.
Exception in thread "main" java.io.IOException: I
[ http://issues.apache.org/jira/browse/NUTCH-309?page=all ]
Doug Cutting reopened NUTCH-309:
I am re-opening this issue, as the guards were added in far too many places.
Jerome, can you please fix these so that guards are only added when (a) the log
+1
Piotr Kosiorowski wrote:
> +1.
> P.
> Andrzej Bialecki wrote:
>> Sami Siren wrote:
>>> How would folks feel about releasing 0.8 now, there has been quite a
>>> lot of improvements/new features
>>> since 0.7 series and I strongly feel that we should push the first
>>> 0.8 series release (alfa/
[ http://issues.apache.org/jira/browse/NUTCH-312?page=all ]
Doug Cutting resolved NUTCH-312:
Fix Version: 0.8-dev
Resolution: Fixed
I just upgraded Nutch to Hadoop 0.4.0, incorporating this patch. Thanks,
Milind!
> Fix for upcom
[
http://issues.apache.org/jira/browse/NUTCH-303?page=comments#action_12417346 ]
Doug Cutting commented on NUTCH-303:
Jerome: thanks very much for all of your great work improving Nutch's logging!
> logging impr
[EMAIL PROTECTED] wrote:
> NUTCH-309 : Added logging code guards
[ ... ]
> + if (LOG.isWarnEnabled()) {
> +LOG.warn("Line does not contain a field name: " + line);
> + }
[ ...]
-1
I don't think guards should be added everywhere. They make the code
bigger and provid
http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.html
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers
Jérôme Charron wrote:
> For now, I have used the same log4 properties than hadoop (see
> http://svn.apache.org/viewvc/lucene/hadoop/trunk/conf/log4j.properties?view=markup&pathrev=411254
>
>
> ) for the back-end, and
> I was thinking to use the stdout for front-end.
> What do you think about thi
Stefan Groschupf wrote:
> As far I understand hadoop use commons logging. Should we switch to use
> commons logging as well?
+1
Doug
___
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo
[
http://issues.apache.org/jira/browse/NUTCH-289?page=comments#action_12414114 ]
Doug Cutting commented on NUTCH-289:
It should be possible to partition by IP and limit fetchlists by IP. Resolving
only in the fetcher is too late to implement these
Ken Krugler wrote:
2. Are the Nutch Devs replying to the emails sent to this list? I could
understand if they are replying off-list, but to an outside observer
such as
myself it appears as though webmasters are not getting many replies
to their
inqueries.
I can speak for myself only .. I'm
CrawlDatum should store IP address
--
Key: NUTCH-289
URL: http://issues.apache.org/jira/browse/NUTCH-289
Project: Nutch
Type: Bug
Components: fetcher
Versions: 0.8-dev
Reporter: Doug Cutting
If the CrawlDatum stored
[
http://issues.apache.org/jira/browse/NUTCH-273?page=comments#action_12413528 ]
Doug Cutting commented on NUTCH-273:
Redirects should really not be followed immediately anyway. We should instead
note that it was redirected and to which URL in the
[
http://issues.apache.org/jira/browse/NUTCH-288?page=comments#action_12413305 ]
Doug Cutting commented on NUTCH-288:
> Is there a quickfix possible somehow?
Someone needs to fix the OpenSearch servlet.
It looks like just changing line 146
[
http://issues.apache.org/jira/browse/NUTCH-288?page=comments#action_12413272 ]
Doug Cutting commented on NUTCH-288:
> Is there a performant way of doing deduplication and knowing for sure how
> many documents are available to view?
No.
[
http://issues.apache.org/jira/browse/NUTCH-272?page=comments#action_12412846 ]
Doug Cutting commented on NUTCH-272:
In 0.8, urls are filtered both when generating and when updating the DB.
Strictly speaking, they're only required when updatin
[
http://issues.apache.org/jira/browse/NUTCH-272?page=comments#action_12412605 ]
Doug Cutting commented on NUTCH-272:
Does the existing generate.max.per.host parameter not meet this need?
> Max. pages to crawl/fetch per site (emergency li
Andrzej Bialecki wrote:
I read through your email exchange, and setting aside all emotional
content I think this is a valid request - indeed, as far as I can tell
other major crawlers don't follow these links. We could either remove
this, or make it optional (default not to use them).
Is this
[
http://issues.apache.org/jira/browse/NUTCH-267?page=comments#action_12379116 ]
Doug Cutting commented on NUTCH-267:
re: it's as if we didn't want it to be re-crawled if we can't find any inlinks
to it
We prioritize crawling based o
Andrzej Bialecki wrote:
I'm planning to work on adding support in 0.8 for interleaved fetch cycles.
Great!
Then, when running an updatedb, the issue of scores and metadata comes
into question. We can imagine now that there were some other updatedb-s
run in the meantime, not necessarily with
Jérôme Charron wrote:
Yes Doug, but in fact, the idea is to add the toString(Formatter) method in
a common place (Summary).
And add one specific Formatter implementation for OpenSearch and another
one
for search.jsp :
The reason is that they should not use the same HTML code :
1. OpenSearch sho
This is a known, fixed, Hadoop bug:
http://issues.apache.org/jira/browse/HADOOP-201
I'm going to release Hadoop 0.2.1 with this and one other patch as soon
as Subversion is back up, then upgrade Nutch to use 0.2.1.
Doug
Marko Bauhardt wrote:
Hi all,
i start nutch-0.8-dev (Revision 405738)
Sami Siren wrote:
Also a friendly hint to all plugin hackers, you need to enable
summary-basic in your existing nutch-site.xml to get things working.
Took me some time to realize this fact :)
Sounds like we should enable it by default, no?
Doug
-
Jérôme Charron wrote:
This means there's no markup in the OpenSearch output?
Yes, no markup for now.
Doesn't this break any existing application that uses OpenSearch and
displays summaries in a web browser? This is an incompatible change
which we should avoid.
Shouldn't there be?
Th
Thanks for making this change!
A few comments:
[EMAIL PROTECTED] wrote:
==
---
lucene/nutch/trunk/src/java/org/apache/nutch/searcher/OpenSearchServlet.java
(original)
+++
lucene/nutch/trunk/src/java/org/apache/nutch/
[
http://issues.apache.org/jira/browse/NUTCH-267?page=comments#action_12378765 ]
Doug Cutting commented on NUTCH-267:
Andrzej: your analysis is correct, but it mostly only applies when re-crawling.
In an initial crawl, where each url is fetched only
[
http://issues.apache.org/jira/browse/NUTCH-267?page=comments#action_12378560 ]
Doug Cutting commented on NUTCH-267:
The OPIC score is much like a count of incoming links, but a bit more refined.
OPIC(P) is one plus the sum of the OPIC contributions
Chris Schneider wrote:
I just noticed that the generate.max.per.host property is only enforced
on a "per reduce task" basis during the first generate job (see
Generator.Selector.reduce for details). At a minimum, it should probably
be documented this way in nutch-default.xml.template.
Yes, bu
[
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12378458 ]
Doug Cutting commented on NUTCH-134:
+1 for Summary as Writable and change HitSummarizer.getSummary() to return a
Summary directly rather than a String. I don't
It seems Stefan is giving a talk...
http://events.commerce.net/?p=58
Doug
---
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM
This sort of error will become much harder to make once we upgrade to
Hadoop 0.2 and replace most uses of java.io.File with
org.apache.hadoop.fs.Path.
Doug
[EMAIL PROTECTED] wrote:
Author: ab
Date: Wed May 3 19:42:02 2006
New Revision: 399515
URL: http://svn.apache.org/viewcvs?rev=399515&vi
Jérôme Charron wrote:
We had to turn off
the guessing of content types to index Apache correctly.
Instead of turning off the guessing of content types you should only to
remove the magic for xml in mime-types.xml
Perhaps that would have worked also, but, with Apache, simply trusting
the decl
[EMAIL PROTECTED] wrote:
As far as we understood from MapRed documentation all reduce tasks must be
launched after last map task is finished e.g map and reduce must not work
simultaneously. But often in logs we see such records: "map 80%, reduce 10%"
and many more records where map is less then 1
[
http://issues.apache.org/jira/browse/NUTCH-256?page=comments#action_12376993 ]
Doug Cutting commented on NUTCH-256:
I think this is really a bug in Hadoop's FileSystem.createNewFile() method.
I've just fixed that. Does that work for y
[ http://issues.apache.org/jira/browse/NUTCH-256?page=all ]
Doug Cutting resolved NUTCH-256:
Resolution: Fixed
Assign To: Doug Cutting
This is fixed in Hadoop 0.2.
> Cannot open filename index.done.
[
http://issues.apache.org/jira/browse/NUTCH-257?page=comments#action_12376989 ]
Doug Cutting commented on NUTCH-257:
I'd vote to never have Summary#toString() perform entity encoding, to fix
search.jsp to encode things itself, and *not* to add
[
http://issues.apache.org/jira/browse/NUTCH-256?page=comments#action_12376839 ]
Doug Cutting commented on NUTCH-256:
That's not a fatal exception, right? Everything still works? It should. This
is just the DFS version of FileNotFound, whi
Jérôme Charron wrote:
Finaly it is a good news that Nutch seems to be more "intelligent" on
content-type guessing than Firefox or IE, no?
I'm not so sure. When crawling Apache we had trouble with this feature.
Some HTML files that had an XML header and the server identified as
"text/html" N
at java.lang.reflect.Array.getLength(Native Method)
at
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:92)
at
org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:64)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:250)
-Original Message-----
[EMAIL PROTECTED] wrote:
We updated hadoop from trunk branch. But now we get new errors:
Oops. Looks like I introduced a bug yesterday. Let me fix it...
Sorry,
Doug
---
Using Tomcat but need to do more? Need to support web services, secu
This is a Hadoop DFS error. It could mean that you don't have any
datanodes running, or that all your datanodes are full. Or, it could be
a bug in dfs. You might try a recent nightly build of Hadoop to see if
it works any better.
Doug
Anton Potehin wrote:
What means error of following typ
Andrzej Bialecki wrote:
> Hmm.. I understand his point. But it means that I have to always put
"if
(datum.getMetaData() == null)" check, which pollutes the code in all
places that deal with metadata. Currently this is just CrawlDbReducer
(but it already looks ugly there), but it will be like t
Jérôme Charron wrote:
we think it would be a good idea to split Nutch into a new sub-project based
on content analysis
manipulation. The components we have identified are :
1. MimeType Repository
2. Language Identifier
3. Content Signature (MD5Signature / TextProfileSignature / ...)
(4. Generic
Folks can say whether they'll attend at:
http://www.evite.com/app/publicUrl/[EMAIL PROTECTED]/nutch-1
Doug
---
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to
One more thing. This parameter should be set in mapred-default.xml, not
hadoop-site.xml or nutch-site.xml. Parameters in those latter files
cannot be overridden by application settings, and mapred.map.tasks is
sometimes overidden.
Doug
-
Anton Potehin wrote:
We have a question on this property. Is it really preferred to set this
parameter several times greater than number of available hosts? We do
not understand why it should be so?
It should be at least numHosts*mapred.tasktracker.tasks.maximum, so that
all of the task slots
[
http://issues.apache.org/jira/browse/NUTCH-173?page=comments#action_12375421 ]
Doug Cutting commented on NUTCH-173:
+1, with a few modifications.
Can you please re-generate this against the current sources? This patch does
not apply for me.
Also
[ http://issues.apache.org/jira/browse/NUTCH-250?page=all ]
Doug Cutting resolved NUTCH-250:
Fix Version: 0.8-dev
Resolution: Fixed
Assign To: Doug Cutting
I just committed this. Thanks, Rod.
> Generate to log truncation caused
1 - 100 of 1038 matches
Mail list logo