Re: per-page boost - concise definition anywhere?

2005-09-05 Thread Ken Krugler
settings and don't run the DistributedAnalysisTool, then all of the page scores are 1.0. So the Lucene document boost winds up being ln(e + inbound link count). 0 inbound links == 1.0, 10 links = 2.54, 100 links = 4.63, etc. -- Ken -- Ken Krugler TransPac Software, Inc. http://www.transpac.com +1 530

Re: fetch questions - freezing

2005-10-27 Thread Ken Krugler
to my project and am plugging in some additional logging to help track down the issue. -- Ken -- Ken Krugler Krugle, Inc. +1 530-470-9200

Re: Indexer Performance - up to 200+ rec/s with Lang identification enabled

2005-10-28 Thread Ken Krugler
the language to adjust the nextScore value for outlinks to pages that don't currently exist. Then in FetchListTool use this nextScore value, and provide some topN value such that the top links are going to be in your target language. -- Ken -- Ken Krugler Krugle, Inc. +1 530-470-9200

Re: fetch questions - freezing

2005-10-28 Thread Ken Krugler
, other than boosting the CPU usage to 80%. More research results to come... -- Ken -- Ken Krugler Krugle, Inc. +1 530-470-9200

Re: fetch questions - freezing

2005-10-28 Thread Ken Krugler
Ken Krugler wrote: We're only using the html text parsers, so I don't think that's the problem. Plus we dumping the thread stack when it hangs, and it's always in the ChunkedInputStream.exhaustInputStream() process (see trace below). The trace did not make it. Oops - see at the end

Re: max delays error

2005-11-03 Thread Ken Krugler
URL weights to avoid having any one domain with a significantly higher percentage of URLs than any other domain, but so far that hasn't been an issue for us. -- Ken Krugler Krugle, Inc. +1 530-470-9200

Re: InterruptedException from ControllerThreadSocketFactory.SocketTask

2005-11-13 Thread Ken Krugler
an example: http://jira.atlassian.com/browse/CONF-2848 Do any Nutch users have experience using file:/dev/random? Thanks, - Chris -- Chris Schneider TransPac Software, Inc. [EMAIL PROTECTED] -- Ken Krugler Krugle, Inc. +1 530-470-9200

Re: InterruptedException from ControllerThreadSocketFactory.SocketTask

2005-11-13 Thread Ken Krugler
use [EMAIL PROTECTED] when posting, to keep it spam free, so either a bcc or [EMAIL PROTECTED] in the cc field would be better. -- Ken Krugler Krugle, Inc. +1 530-470-9200

Re: Setting up a crawler for a country.

2005-11-29 Thread Ken Krugler
making some mods to Nutch to improve our performance, but it's not debugged yet...getting closer, though. -- Ken -- Ken Krugler Krugle, Inc. +1 530-470-9200

Re: Limiting search/crawl to specific language

2006-01-04 Thread Ken Krugler
could focus your crawl on English-content pages. -- Ken -- Ken Krugler Krugle, Inc. +1 530-470-9200

Error running MapReduce - Host key verification failed

2006-01-14 Thread Ken Krugler
[main1/192.168.0.100:8010]. ex=java.net.ConnectException: Connection refused Retrying... It seems like the 192.168.0.103 machine doesn't have the right settings for connecting to the 192.168.0.100 machine. Is there a way to check this outside of running Nutch? Thanks, -- Ken -- Ken Krugler

Error running MapReduce - Jetty server .jsp files

2006-01-14 Thread Ken Krugler
/localedata.jar:/usr/java/jre1.5.0_05/lib/ext/sunpkcs11.jar [snip] So obviously somebody is using JAVA_HOME to build the path to these .jar files. But JAVA_HOME (the top-level path, ie /usr/java/jre1.5.0_05) isn't a member of this classpath. Any help would be appreciated! Thanks, -- Ken -- Ken

Error at end of MapReduce run with indexing

2006-01-14 Thread Ken Krugler
commands (readdb, segread, etc) don't seem to be working with the new NDFS setup. 4. Any idea whether 4 hours is a reasonable amount of time for this test? It seemed long to me, given that I was starting with a single URL as the seed. Thanks, -- Ken -- Ken Krugler Krugle, Inc. +1 530-470-9200

Improving Nutch throughput w/MapReduce

2006-01-14 Thread Ken Krugler
for another run once this one has had a chance to generate some interesting results. Thanks, -- Ken -- Ken Krugler

Re: [Nutch-general] Error running MapReduce - Jetty server .jsp files

2006-01-15 Thread Ken Krugler
is only designed to be used via links from the jobtracker.jsp page. And thanks to Andrzej for his November post that noted this. -- Ken - Original Message From: Ken Krugler [EMAIL PROTECTED] To: nutch-user@lucene.apache.org Sent: Sat 14 Jan 2006 05:50:00 PM EST Subject: [Nutch-general

Re: Error at end of MapReduce run with indexing

2006-01-17 Thread Ken Krugler
: http://mail-archives.apache.org/mod_mbox/lucene-nutch-user/200509.mbox/[EMAIL PROTECTED] -- Ken -- Ken Krugler Krugle, Inc. +1 530-470-9200

Re: Problems with MapRed-

2006-01-29 Thread Ken Krugler
-- Ken Krugler Krugle, Inc. +1 530-470-9200

Re: Problems with MapRed-

2006-01-29 Thread Ken Krugler
060129 13 Zero targets found, forbidden1.size=2allowSameHostTargets=false forbidden2.size()=0 -- Ken Krugler Krugle, Inc. +1 530-470-9200

Re: Error at end of MapReduce run with indexing

2006-02-03 Thread Ken Krugler
timeout value being too low. We were getting lots of timeout errors, which was killing our performance. -- Ken -- Ken Krugler Krugle, Inc. +1 530-470-9200

Re: HTTPS Protocol Implementation

2006-02-14 Thread Ken Krugler
Is there an HTTPS protocol implementation for nutch? If you use protocol-httpclient (versus protocol-http) then it should support https. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

Re: Link Farms

2006-03-07 Thread Ken Krugler
that we can filter? I've read a paper on detecting link farms, but from what I remember, it wasn't a slam-dunk to implement. So far we've relied on manually detecting these, and then pruning the results from the crawldb and adding them to the regex-urlfilter file. -- Ken -- Ken Krugler Krugle

lost NDFS blocks following network reorg

2006-03-25 Thread Ken Krugler
NDFS bugs. Thanks, -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

Re: Large fetch fails with Task process exit with nonzero status

2006-04-07 Thread Ken Krugler
and I'm now stuck :/ Can anyone provide some clues as to where I might start on debugging this issue? Regards, -Shawn -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

Re: Using Nutch's distributed search server mode

2006-04-18 Thread Ken Krugler
depends on the # of docs you want to be serving up from each search server - in our case, I think it's about 10M or so. Obviously this varies depending on the amount of RAM/horsepower you have on the server, and your target query performance. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find

MultiSearcher skewed IDF values

2006-04-27 Thread Ken Krugler
-- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

Re: changing ranking

2006-05-20 Thread Ken Krugler
the crawl of all current pages, to the extent necessary to get reasonable history/page cash values for OPIC. But that's just a guess until the actual implementation is at least sketched out. -- Ken Andrzej Bialecki [EMAIL PROTECTED] wrote: Ken Krugler wrote: Eugen Kochuev wrote: Hello

Re: .8 svn - fetcher performance..

2006-06-27 Thread Ken Krugler
crawler, and with Nutch 0.8 it's more like 2000+ threads...though you have to reduce the thread stack size in this type of configuration. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

Re: Fetcher hanging temporarily on deflateBytes method

2006-06-29 Thread Ken Krugler
(Fetcher.java:148) -- Daniel Varela Santoalla European Centre for Medium-Range Weather Forecasts (ECMWF) (http://www.ecmwf.int) -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

Re: .8 svn - fetcher performance..

2006-07-10 Thread Ken Krugler
On 6/28/06, Ken Krugler [EMAIL PROTECTED] wrote: Hi Doug, Did you ever resolve your 0.8 vs 0.7 crawling performance question? I'm running into a similar problem. We wound up dramatically increasing the number of threads, which seemed to help solve the bandwidth utilization problem

Re: Pornfilter

2006-07-12 Thread Ken Krugler
-- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

Re: nutch suitable for blogs?

2006-07-13 Thread Ken Krugler
of Nutch as a better solution, but until then I think it's probably faster to use Nutch as your starting point, and also if/when that time comes, you'll have a much better understanding of how best to slice and dice. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

Re: Distributed Searching Index Size

2006-08-07 Thread Ken Krugler
spindle/low seek times (e.g. WD Raptor SATA disks). -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

Re: On fetcher slowness

2006-08-13 Thread Ken Krugler
Ken Krugler wrote: On 8/12/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hello, Several people reported issues with slow fetcher in 0.8... I run Nutch on a dual CPU (+HT) box, and have noticed that the fetch speed didn't increase when I went from using 100 threads, to 200 threads. Has

Re: How does Nutch-0.7.2 data upgrade to 0.8?

2006-08-23 Thread Ken Krugler
___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

Re: How does Nutch-0.7.2 data upgrade to 0.8?

2006-08-23 Thread Ken Krugler
to get at all of the previously fetched content. -- Ken Ken Krugler wrote: It's really a sad news for me. I must spend a lot of time on fetching it again. If it's only just HTML, then you could do a quick hack in 0.8 to fetch the pages from your 0.7 crawl, using a modified fetcher. You

Re: processing parallel sites

2006-08-28 Thread Ken Krugler
to a sequential file. After a fetch cycle, additional data about the page state gets processed, and the results are used to update the crawldb, which is a kind of specialized database for web crawling. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

Re: Charset question

2006-09-07 Thread Ken Krugler
= big5-hkscs, but then you'd have to rebuild Nutch. See the resolveEncodingAlias() method here: http://www.krugle.com/files/svn/svn.apache.org/lucene/nutch/trunk/src/java/org/apache/nutch/util/StringUtil.java -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

Re: [Nutch-general] Caching the search results

2006-09-08 Thread Ken Krugler
heard about this feature but I am not finding the information Thanks, Marco -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

RE: Charset question

2006-09-13 Thread Ken Krugler
/nutch/util/StringUtil.java -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

RE: Charset question

2006-09-17 Thread Ken Krugler
, Mac OS X 10.4.7) says that GB18030 is supported, so I'm guessing that's not your problem. -- Ken Ken Krugler wrote: Thanks for your reply. I have found that the method you mentioned looks into the http header from web server. It looks for charset and does the mapping. The apache web

Re: Default character encoding

2006-12-06 Thread Ken Krugler
was initially parsed. So it wound up in the Nutch segments/index with the wrong value. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

Re: 0.9 ClassCastException: org.apache.hadoop.io.Text

2007-04-22 Thread Ken Krugler
) at org.apache.nutch.searcher.FetchedSegments$SummaryThread.run(FetchedSe gments.java:177) -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers

Re: Nutch encoding problem

2007-04-30 Thread Ken Krugler
to be valid UTF-8, and from my experience Nutch works correctly with correctly identified UTF-8 web pages. So I'm I'm guessing the '?' come about when your webapp container/server tries to convert the UTF-8 data to 8859-2. -- Ken Ken Krugler ([EMAIL PROTECTED]) wrote: Hi All, I would

Re: Nutch encoding problem

2007-04-30 Thread Ken Krugler
-transitional.dtd' The data seems to be valid UTF-8, and from my experience Nutch works correctly with correctly identified UTF-8 web pages. So I'm I'm guessing the '?' come about when your webapp container/server tries to convert the UTF-8 data to 8859-2. -- Ken Ken Krugler ([EMAIL

Re: Nutch 0.9 and Crawl-Delay

2007-06-04 Thread Ken Krugler
Angeles based Internet startup company. For more information please visit http://www.ilial.com/crawler; http://www.ilial.com/crawler; [EMAIL PROTECTED]) -- Ken Krugler Krugle, Inc. +1 530-210-6378 If you can't find it, you can't fix it

RE: Hardware Planning

2007-11-29 Thread Ken Krugler
the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you. -- Ken Krugler Krugle, Inc. +1 530-210-6378 If you can't find it, you can't fix it

Re: DFS search

2007-12-16 Thread Ken Krugler
DFS? Thanks -- Ken Krugler Krugle, Inc. +1 530-210-6378 If you can't find it, you can't fix it

Re: Nutch score based on document recency

2007-12-18 Thread Ken Krugler
this is a good way or whether including date range clauses would have an adverse impact on performance. Am I missing something? Is there a better way of doing this? Any help would be much appreciated. Regards, Chris -- Ken Krugler Krugle, Inc. +1 530-210-6378 If you can't find it, you can't fix it

Re: Normalizing host names (e.g. www1|www2 = www)

2008-04-27 Thread Ken Krugler
this isn't the case, even if the page similarity calculation determines that two pages should be the same. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 If you can't find it, you can't fix it

Re: Odd results and broken docs when indexing converted ARC-files.

2009-04-17 Thread Ken Krugler
] /[segment] afterwards I created crawldb and linkdb using ../bin/nutch crawldb ... and ../bin/nutch invertlinks ... then I took solrindex in order to put everything in solr. Can somebody help? Thank you very much! -- Ken Krugler +1 530-210-6378

Re: Can't build Nutch

2009-04-20 Thread Ken Krugler
] ^ [javac] 1 error BUILD FAILED /usr/local/nutch-1.0/build.xml:107: Compile failed; see the compiler error output for details. -- Ken Krugler +1 530-210-6378

Re: Nutch Crawling Questions

2009-04-20 Thread Ken Krugler
-- Ken Krugler +1 530-210-6378

Re: threads get stuck in spinwaiting

2009-05-27 Thread Ken Krugler
at Nabble.com. -- Ken Krugler +1 530-210-6378

Re: threads get stuck in spinwaiting

2009-05-28 Thread Ken Krugler
140mb. Are you talking about the topN value when you say I should set the max URLs/hosts, or is there another setting I haven't found yet? http://pastebin.com/m33bb6e6b Ken Krugler wrote: That could be true, but is that something I, as a nutch user can configure? It's interesting that your

Re: Arabic language in Nutch

2009-06-01 Thread Ken Krugler
. Is there any issue with charset? plz help me. Thanks in advance. Regards, Chetan Patel -- Ken Krugler +1 530-210-6378

Re: Merge taking forever

2009-06-06 Thread Ken Krugler
center, but crawl using EC2, the time cost of moving the content could be excessive. Though Amazon recently introduced AWS Import/Export to help address this issue. -- Ken 2009/6/5 Ken Krugler kkrugler_li...@transpac.com how long does it take for your 6 millions URLs to be crawled

Re: Nutch fetch performance

2009-06-25 Thread Ken Krugler
and mapred.reduce.tasks to 15. Is this correct? HTTP timeout is 5 seconds, max reties 2, 0.5 seconds between retries. fetcher.threads.fetch is 300. How can I tweak the performance? What other options may affect performance? Should I provide some other information for you to be able to help me? -- Ken Krugler

Re: Nutch fetch performance

2009-06-25 Thread Ken Krugler
(on a server) * 5, and the number of reducers == the number of cores. Oh, and the number of threads to 200/# of mappers. But treat that as a random data point. -- Ken Ken Krugler wrote: See the previous discussion about how having relatively few unique domains can significantly limit

Re: Nutch fetch performance

2009-06-26 Thread Ken Krugler
should see entries like: -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526 - fetching http://home.swipnet.se/~w-147200/ -- Ken Ken Krugler wrote: The real question is how many active fetches you have running simultaneously. If most fetcher threads are idle, waiting for 30

Re: Nutch fetch performance

2009-06-26 Thread Ken Krugler
Ken Krugler wrote: If this is http.timeout, that's the length of time an HTTP request will wait for a response before timing out. Which hopefully doesn't happen very often for you. Yes, it is it. Ken Krugler wrote: Delay between retries - what property name

Re: Problems when index .chm files

2009-07-06 Thread Ken Krugler
other Microsoft language. See http://en.wikipedia.org/wiki/Microsoft_Compiled_HTML_Help for details. -- Ken -- Ken Krugler http://ken-blog.krugler.org +1 530-265-2225

Re: Weighting different html text nodes - h1,h2 etc..

2009-07-09 Thread Ken Krugler
, AFAIK there's no special weighting given to text pulled from the body of the HTML. I believe Nutch does give higher weight to the anchor text found for links that point to the page, which is a key factor in generating better search results. -- Ken -- Ken Krugler +1 530-210-6378

Re: Arc to segements failed for Task attempt_200907091108_0001_m_000520_0 failed to report status for 602 seconds. Killing!

2009-07-09 Thread Ken Krugler
namemapred.system.dir/name value/home/had/nutch-1.0/filesystem/mapreduce/system/value /property property namemapred.local.dir/name value/home/had/nutch-1.0/filesystem/mapreduce/local/value /property /configuration -- Ken Krugler +1 530-210-6378

Re: Nutch Character encoding converter

2009-07-12 Thread Ken Krugler
Nutch has a auto detector for character encoding. Does it convert character to standard encoding automatically, after detecting it? Yes - Nutch converts text to Unicode for all subsequent processing. -- Ken -- Ken Krugler +1 530-210-6378

Re: A few questions about crawl-urlfilter.txt

2009-07-14 Thread Ken Krugler
. if there is a URL like http://www.mysite.com/images/1 which returns an image, will Nutch be able to identify it and avoid it's download? I think Nutch will download the file, since filtering of URLs happens before fetching. -- Ken -- Ken Krugler +1 530-210-6378

Re: Focussed Web Crawling with Nutch

2009-07-31 Thread Ken Krugler
to our topic. 7) we loop back to 3 above. Eventually we end up with a lucene style index as usual which can be used with the nutch web app, or solr, or some other code Who is interested in this or has done it in the past and can we chat about it? Alex -- Ken Krugler

Re: Nutch.SIGNATURE_KEY

2009-08-19 Thread Ken Krugler
Hi Paul, On Aug 19, 2009, at 6:08am, Paul Tomblin wrote: Is SIGNATURE_KEY (aka nutch.content.digest) a valid way to check if my page has changed since the last time I crawled it? I patched Nutch to properly handle modification dates, and then discovered that my web site doesn't send

Re: Usage of ArcSegmentCreator

2009-09-09 Thread Ken Krugler
. Thanks -- View this message in context: http://www.nabble.com/Usage-of-ArcSegmentCreator-tp25373232p25373232.html Sent from the Nutch - User mailing list archive at Nabble.com. -- Ken Krugler TransPac Software, Inc. http://www.transpac.com +1 530-210-6378

Re: URL built by JavaScript Function - Can this be Crawled

2009-09-14 Thread Ken Krugler
am also using the parse-js plugin. But it does not look like Nautch is able to crawl these URLs. Am I doing something wrong or Nutch is not able to crawl URLs build by JavaScript function. Thanks/Regards, Parvez -- Ken Krugler TransPac Software, Inc. http

Re: how to upgrade a java application with nutch?

2009-10-01 Thread Ken Krugler
___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com -- Ken Krugler TransPac Software, Inc. http://www.transpac.com +1 530-210-6378

Re: indexing just certain content

2009-10-09 Thread Ken Krugler
can you plz just tell us in english what the plugin creativecommons is for ? i mean if i will include this plugin in my nutch-site.txt, what will i have as result ? I think Andrzej is suggesting that you read the code. If you look at the beginning of the CCParseFilter.java file, you'll see:

Re: Extract full urls from DOM

2009-10-29 Thread Ken Krugler
-- Ken Krugler TransPac Software, Inc. http://www.transpac.com +1 530-210-6378

Re: char encoding

2009-10-29 Thread Ken Krugler
chars end up as ? ; i dont have any special requirement for any special characters, i am happy with usual utf-8 any suggestion on the best way to configure this correctly; everything seems quite ok looking at the code not sure whats missing. thanks. -- Ken Krugler

Re: char encoding

2009-10-30 Thread Ken Krugler
characters, i am happy with usual utf-8 any suggestion on the best way to configure this correctly; everything seems quite ok looking at the code not sure whats missing. thanks. -- Ken Krugler TransPac Software, Inc. http://www.transpac.com +1 530-210

Re: substitute unknown parts of the url

2009-11-18 Thread Ken Krugler
unknown parts (folders) of the url? Something like... http://([a-z0-9]*\.)*website.com/[^/]+/known-folder/ -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g

Re: support for robot rules that include a wild card

2009-11-19 Thread Ken Krugler
? Thanks, Jason Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g

Re: Efficient focused crawling

2009-11-29 Thread Ken Krugler
or not with issue no. 1. Any idea guys? I will be very grateful for any help or things that can point me in the right direction. Thanks, Eran -- -MilleBii- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g

Re: Maintaining website version with Nutch

2010-01-11 Thread Ken Krugler
. You'd need to tune the config parameters to do the re-crawl at the target interval. Though for pure site archiving, Heritrix is a more optimized solution, especially when used with some of the add-on admin GUIs. -- Ken Ken Krugler +1 530-210-6378

Re: url normalization

2010-01-27 Thread Ken Krugler
claudio.marte...@tis.bz.it http://www.tis.bz.it Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g

Re: A well-behaved crawler

2010-02-03 Thread Ken Krugler
to a detailed web page explaining the purpose of the bot, etc. But my crawler still banned by several sites... :( cheers iful http://zipclue.com Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g

Re: PDF Parsing

2010-02-03 Thread Ken Krugler
-mail: withan...@asia-europe.uni-heidelberg.de Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g

Re: Using Tika to crawl doc, pdf, etc.

2010-02-10 Thread Ken Krugler
in there by default? Should be there by default, once the Tika plug-in gets rolled in. -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g

Re: ParseText contains newline

2010-02-18 Thread Ken Krugler
this issue ? Thanks Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g

Re: Update on ignoring menu divs

2010-03-01 Thread Ken Krugler
that gets rolled into the source then it should be easier to use the project with Nutch. -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g

Re: Nutch and EC2

2010-04-10 Thread Ken Krugler
sounds very strange - I'd check on the AWS EC2 forum to see if anybody else has reported this with the AMI that you're using. -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g

Re: Weird crawl issue. Nutch picking up drop-down menu options.

2010-04-18 Thread Ken Krugler
the relevancy of this page without changing the page itself? -- View this message in context: http://n3.nabble.com/Weird-crawl-issue-Nutch-picking-up-drop-down-menu-options-tp721751p721751.html Sent from the Nutch - User mailing list archive at Nabble.com. Ken