Re: Help with parse-mp3?

2008-01-18 Thread Hasan Diwan
Sir:
you need to build it by enabling it in build.xml in src/plugin/ ; it
should then be built with ant.
On 18/01/2008, Rick Francis [EMAIL PROTECTED] wrote:
 I'm trying to get the parse-mp3 plugin working in my nutch installation.
 I've found and downloaded jid3lib-0.5.4.jar but I can't find parse-mp3.jar.
 I see the source for it in the nutch distribution, but not the jar file. I'm
 a Java newbie so I'm not sure exactly what I need to build the jar file from
 the source. Any help or pointers would be appreciated.

 Rick



-- 
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: problem with mp3 parser

2007-12-12 Thread Hasan Diwan
On 12/12/2007, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  It did not help. Also I checked the search.dir value does not change in 
 C:\Tomcat\webapps\ROOT\WEB-INF\classes\nutch-default.xml although I changed 
 it in nutch/conf/nutch-deafult.xml. Should the size of nutch*.war file to 
 change depending on how many sites are fetched. Also if I out all nutch 
 command in a file and execute it, nutch gives errors like some directory is 
 not found, although the dir is there.

No, the data is stored outside the web archive.

Is your machine externally accessible? If so, please email me offlist
and I'd love to take a (brief) look and let you know if I see
anything.
-- 
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: problem with mp3 parser

2007-12-11 Thread Hasan Diwan
Think you may need the jar file in plugin/mp3/lib?

On 12/11/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

  Hi All,

 I have in nutch/conf/nutch-default.xml the following


  property
 ? nameplugin.includes/name
 ?
 valuenutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html|js|mp3)|index-(basic|more)|query-(basic|more|site|url)|summary-basic|scoring-opic/value


  ...


 However in

 C:\Tomcat\webapps\ROOT\WEB-INF\classes\nutch-default.xml


 property
   nameplugin.includes/name

 valueprotocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)/value


 as you see mp3 is missing. And mp3 plugin is also missing in tomcat's plugin
 dir.

 Any ideas why this happened?

 Thanks.
 Alex.


 -Original Message-
 From: [EMAIL PROTECTED]
 To: nutch-user@lucene.apache.org
 Sent: Fri, 7 Dec 2007 3:08 pm
 Subject: problem with mp3 parser










 Hello,

 I have build mp3 parser and put it in C:\nutch\plugins . However, nutch does
 not
 find mp3's. I checked C:\Tomcat\webapps\ROOT\WEB-INF\classes\plugins dir.
 There
 is no parser-mp3 folder.

 Any idea how to fix this?

 Thanks.
 Alex.

 
 More new features than ever.  Check out the new AIM(R) Mail ! -
 http://o.aolcdn.com/cdn.webmail.aol.com/mailtour/aol/en-us/text.htm?ncid=aimcmp000501






 
 More new features than ever.  Check out the new AIM(R) Mail ! -
 http://webmail.aim.com


-- 
Sent from Gmail for mobile | mobile.google.com

Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: Newbie questions about followed links

2007-03-08 Thread Hasan Diwan

Sir:
On 08/03/07, Jeroen Verhagen [EMAIL PROTECTED] wrote:

Surely these links look ordinary enough to be seen and followed by
nutch? Could someone please tell me what could be causing these links
not be followed?


conf/urlfilter.txt.template contains the line:
[EMAIL PROTECTED]

Remove the '?' and the links will be followed.

--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: How can I setup an mp3 search engine?

2006-10-28 Thread Hasan Diwan

On 28/10/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

Can the plugin parse-mp3 parse the infomation in mp3 files such as author, 
song name, artist and so on ?


The parse-mp3 plugin can obtain any information in the ID3 tags
contained in the file. If this information is not part of the file,
the plugin (as written) can not pluck information about the file from
thin air.
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: crawling a certain site

2006-08-02 Thread Hasan Diwan

On 01/08/06, Lukas Vlcek [EMAIL PROTECTED] wrote:


Anyway, I think you would need to write some code (be it directly for
nutch or for the web in question).



If you have perl available, you might want to take advantage of the code at
http://prolificprogrammer.com/~hdiwan/getTitle.pl
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: Please Help.. recrawl script.. will send out to the list when finished for 0.8.0

2006-07-20 Thread Hasan Diwan

Mr Holt:

On 7/20/06, Matthew Holt [EMAIL PROTECTED] wrote:

there is a resource online that describes manually recrawling, that'd be
great as well. Thanks.


http://wiki.apache.org/nutch/NutchTutorial -- you're welcome.
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: Eclipse IDE

2006-07-11 Thread Hasan Diwan

Mr Holt:

On 7/11/06, Matthew Holt [EMAIL PROTECTED] wrote:

Can someone that has Nutch developement configured for Eclipse please
paste their .project and .classpath files? Thanks.


Do the following in your project properties:
Source directories should be all the src and test subdirectories under
plugin/* and the libraries should contain all the jar files.
If you want to keep things simple, just use the build file from eclipse.
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: PluginRuntimeException

2006-03-07 Thread Hasan Diwan
, but the

  slowest it is.

  /description

/property



/nutch-conf


 I believe they are mutually exclusive. Just use one or the other. I think
 someone posted here recently saying they had problems with httpclient.
 I'm using protocol-http myself.

So noted, the change has been made.




--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: NullPointerException

2006-03-06 Thread Hasan Diwan
On 06/03/06, Howie Wang [EMAIL PROTECTED] wrote:
 Is query-basic or query-more included in your nutch-default.xml?

It is indeed included in my nutch-site.xml :-

 property
  nameplugin.includes/name
  
valueprotocol-httpclient|urlfilter-regex|parse-(text|html|js)|index-more|query-(more|site|url)/value
 /property
Thanks for the help!
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: NullPointerException

2006-03-05 Thread Hasan Diwan
...
060305 182204 Sorting updates by segment...
060305 182204 Updating segments...
060305 182204  updating /home/hdiwan/SpectraSearch/crawl/segments/20060305182200
060305 182204 Done updating /home/hdiwan/SpectraSearch/crawl/segments
from /home/hdiwan/SpectraSearch/crawl/db
060305 182204 indexing segment:
/home/hdiwan/SpectraSearch/crawl/segments/20060305182200
060305 182205 * Opening segment 20060305182200
060305 182205 * Indexing segment 20060305182200
060305 182205 * Optimizing index...
060305 182205 * Moving index to NFS if needed...
060305 182205 DONE indexing segment 20060305182200: total 15 records
in 0.031 s (Infinity rec/s).
060305 182205 done indexing
060305 182205 indexing segment:
/home/hdiwan/SpectraSearch/crawl/segments/20060305182203
060305 182205 * Opening segment 20060305182203
060305 182205 * Indexing segment 20060305182203
060305 182205 * Optimizing index...
060305 182205 * Moving index to NFS if needed...
060305 182205 DONE indexing segment 20060305182203: total 0 records in
0.075 s (NaN rec/s).
060305 182205 done indexing
060305 182205 Reading url hashes...
060305 182205 Sorting url hashes...
060305 182205 Deleting url duplicates...
060305 182205 Deleted 0 url duplicates.
060305 182205 Reading content hashes...
060305 182205 Sorting content hashes...
060305 182205 Deleting content duplicates...
060305 182205 Deleted 0 content duplicates.
060305 182205 Duplicate deletion complete locally.  Now returning to NFS...
060305 182205 DeleteDuplicates complete
060305 182205 Merging segment indexes...
060305 182205 crawl finished: crawl

That's the entire log. Hope it helps! My crawl-urlfilter.txt:
# The url filter file used by the crawl command.

# Better for intranet crawling.
# Be sure to change MY.DOMAIN.NAME to your domain name.

# Each non-comment, non-blank line contains a regular expression
# prefixed by '+' or '-'.  The first matching pattern in the file
# determines whether a URL is included or ignored.  If no pattern
# matches, the URL is ignored.

# skip file:, ftp:,  mailto: urls
-^(file|ftp|mailto):

# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|png|PNG)$

# skip URLs containing certain characters as probable queries, etc.
[EMAIL PROTECTED]

# accept hosts in any domain
+^http://([a-z0-9]*\.)*/

# skip everything else
-.
So, why isn't it fetching anything, if that is indeed the case?
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: NullPointerException

2006-03-05 Thread Hasan Diwan
Mr Tang:
 Crawling seems ok. Can you pls try org.apache.nutch.searcher.NutchBean
 [your-query-string] in shell/cmd?

server: 7:20pm % ./bin/nutch org.apache.nutch.searcher.NutchBean hasan
060305 192042 10 parsing file:/home/hdiwan/nutch-0.7.1/conf/nutch-default.xml
060305 192042 10 parsing file:/home/hdiwan/nutch-0.7.1/conf/nutch-site.xml
060305 192042 10 opening merged index in /home/hdiwan/SpectraSearch/crawl/index
060305 192042 10 Plugins: looking in: /home/hdiwan/nutch-0.7.1/build/plugins
060305 192042 10 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/nutch-extensionpoints/plugin.xml
060305 192042 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/protocol-file
060305 192042 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/protocol-ftp
060305 192042 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/protocol-http
060305 192042 10 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/protocol-httpclient/plugin.xml
060305 192042 10 impl: point=org.apache.nutch.protocol.Protocol
class=org.apache.nutch.protocol.httpclient.Http
060305 192042 10 impl: point=org.apache.nutch.protocol.Protocol
class=org.apache.nutch.protocol.httpclient.Http
060305 192042 10 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/parse-html/plugin.xml
060305 192042 10 impl: point=org.apache.nutch.parse.Parser class=org.apache.nutc
che.nutch.searcher.more.TypeQueryFilter
060305 192043 10 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.more.DateQueryFilter
060305 192043 10 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/query-site/plugin.xml
060305 192043 10 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.site.SiteQueryFilter
060305 192043 10 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/query-url/plugin.xml
060305 192043 10 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.url.URLQueryFilter
060305 192043 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/urlfilter-regex
060305 192043 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/urlfilter-prefix
060305 192043 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/creativecommons
060305 192043 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/language-identifier
060305 192043 10 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/clustering-carrot2
060305 192043 10 not including: /home/hdiwan/nutch-0.7.1/build/plugins/ontology
Total hits: 0
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: NullPointerException

2006-03-05 Thread Hasan Diwan
Mr Tang:
On 05/03/06, Jack Tang [EMAIL PROTECTED] wrote:
 Weird! You are running nutch on local file system or distributed file system?
Local file system

 And can you find the same query hasan via luke?
Nope

--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: NullPointerException

2006-03-05 Thread Hasan Diwan
On 05/03/06, Jack Tang [EMAIL PROTECTED] wrote:
 I am not sure what's wrong in nutch-0.7.1 indexing, but now it is
 possible to upgrade to nutch 0.8(svn version)?

It is possible, but I was under the assumption that 0.8 required NDFS?
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: NullPointerException

2006-03-05 Thread Hasan Diwan
On 05/03/06, Jack Tang [EMAIL PROTECTED] wrote:
 You can still build it on local file system:)

Build, yes, but what of deployment? Can I use it in the same way? At
present, I don't have enough resources to run a distributed crawl.
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: NullPointerException

2006-03-05 Thread Hasan Diwan
Right then.. compiled the svn version of nutch. Tried running the
crawl with it and this is the log:
server: 11:32pm % ./bin/nutch crawl ../SpectraSearch/urls -dir
../SpectraSearch/crawl -depth 2 -threads 20
060305 233255 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060305 233255 parsing file:/home/hdiwan/nutch/conf/nutch-default.xml
060305 233255 parsing file:/home/hdiwan/nutch/conf/crawl-tool.xml
060305 233255 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060305 233255 parsing file:/home/hdiwan/nutch/conf/nutch-site.xml
060305 233255 parsing file:/home/hdiwan/nutch/conf/hadoop-site.xml
060305 233256 crawl started in: ../SpectraSearch/crawl
060305 233256 rootUrlDir = ../SpectraSearch/urls
060305 233256 threads = 20
060305 233256 depth = 2
060305 233256 Injector: starting
060305 233256 Injector: crawlDb: ../SpectraSearch/crawl/crawldb
060305 233256 Injector: urlDir: ../SpectraSearch/urls
060305 233256 Injector: Converting injected urls to crawl db entries.
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/nutch-default.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/crawl-tool.xml
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/nutch-site.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/hadoop-site.xml
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/nutch-default.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/crawl-tool.xml
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/nutch-site.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/hadoop-site.xml
060305 233256 Running job: job_7n6bsm
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060305 233256 parsing /tmp/hadoop/mapred/local/localRunner/job_7n6bsm.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/hadoop-site.xml
java.io.IOException: No input directories specified in: Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop/mapred/local/localRunner/job_7n6bsm.xmlfinal:
hadoop-site.xml
at 
org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
at 
org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
060305 233257  map 0%  reduce 0%
Exception in thread main java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
I need to sleep now, so I'll check back tomorrow. Thanks for all the help!
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: nutch-extensionpoints 0.71

2006-02-27 Thread Hasan Diwan
Mr. Braman (or anyone  else):

On 27/02/06, Richard Braman [EMAIL PROTECTED] wrote:


 bin/nutch fetch segments/latest_segment


How would I determine which is the latest segment?

I don't really know what your other question was.

I know there are duplicate URLs in urls.txt. Why would I be getting the line
below?

 060227 150626 Deleted 0 content duplicates.


Thanks again for the kind assistance.

--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Re: Duplicate urls in urls file

2006-02-15 Thread Hasan Diwan
Elwin:
On 13/02/06, Elwin [EMAIL PROTECTED] wrote:
   Do you use fixed set of rss feeds for crawl or discover rss feeds
 dynamically?

Before I broke the script, it would take the URL, grab the feeds
specified from the link tags, then parse them. I suspect this is
similar to what the parse-rss plugin does, but I have not had the
chance to look at it as yet.
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


Duplicate urls in urls file

2006-02-13 Thread Hasan Diwan
I've written a perl script to build up a urls file to crawl from RSS
feeds. Will nutch handle duplicate URLs in the crawl file or would
that logic need to be in my perl script?
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]


extension point... does not exist

2006-02-13 Thread Hasan Diwan
I placed the URLs for a crawl in urls per the tutorial [1]. Then:
% ./bin/nutch crawl urls -dir crawl.test -depth 2
... gives me the following log:
060213 131631 parsing file:/home/hdiwan/nutch-0.7.1/conf/nutch-default.xml
060213 131631 parsing file:/home/hdiwan/nutch-0.7.1/conf/crawl-tool.xml
060213 131631 parsing file:/home/hdiwan/nutch-0.7.1/conf/nutch-site.xml
060213 131631 No FS indicated, using default:local
060213 131631 crawl started in: crawl.test
060213 131631 rootUrlFile = urls
060213 131631 threads = 10
060213 131631 depth = 2
060213 131632 Created webdb at LocalFS,/home/hdiwan/nutch-0.7.1/crawl.test/db
060213 131632 Starting URL processing
060213 131632 Plugins: looking in: /home/hdiwan/nutch-0.7.1/build/plugins
060213 131632 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/protocol-file
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/protocol-ftp
060213 131632 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/protocol-http/plugin.xml
060213 131632 impl: point=org.apache.nutch.protocol.Protocol
class=org.apache.nutch.protocol.http.Http
060213 131632 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/protocol-httpclient
060213 131632 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/parse-html/plugin.xml
060213 131632 impl: point=org.apache.nutch.parse.Parser
class=org.apache.nutch.parse.html.HtmlParser
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/parse-js
060213 131632 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/parse-text/plugin.xml
060213 131632 impl: point=org.apache.nutch.parse.Parser
class=org.apache.nutch.parse.text.TextParser
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/parse-pdf
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/parse-rss
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/parse-msword
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/parse-ext
060213 131632 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/index-basic/plugin.xml
060213 131632 impl: point=org.apache.nutch.indexer.IndexingFilter
class=org.apache.nutch.indexer.basic.BasicIndexingFilter
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/index-more
060213 131632 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/query-basic/plugin.xml
060213 131632 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.site.SiteQueryFilter
060213 131632 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/query-url/plugin.xml
060213 131632 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.url.URLQueryFilter
060213 131632 parsing:
/home/hdiwan/nutch-0.7.1/build/plugins/urlfilter-regex/plugin.xml
060213 131632 impl: point=org.apache.nutch.net.URLFilter
class=org.apache.nutch.net.RegexURLFilter
060213 131632 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/urlfilter-prefix
060213 131632 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/creativecommons
060213 131632 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/language-identifier
060213 131632 not including:
/home/hdiwan/nutch-0.7.1/build/plugins/clustering-carrot2
060213 131632 not including: /home/hdiwan/nutch-0.7.1/build/plugins/ontology
060213 131632 SEVERE org.apache.nutch.plugin.PluginRuntimeException:
extension point: org.apache.nutch.protocol.Protocol does not exist.
Exception in thread main java.lang.ExceptionInInitializerError
at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
at 
org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)
at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
Caused by: java.lang.RuntimeException:
org.apache.nutch.plugin.PluginRuntimeException: extension point:
org.apache.nutch.protocol.Protocol does not exist.
at 
org.apache.nutch.plugin.PluginRepository.getInstance(PluginRepository.java:147)
at org.apache.nutch.net.URLFilters.clinit(URLFilters.java:40)
... 4 more
Caused by: org.apache.nutch.plugin.PluginRuntimeException: extension
point: org.apache.nutch.protocol.Protocol does not exist.
at 
org.apache.nutch.plugin.PluginRepository.installExtensions(PluginRepository.java:78)
at 
org.apache.nutch.plugin.PluginRepository.init(PluginRepository.java:61)
at 
org.apache.nutch.plugin.PluginRepository.getInstance(PluginRepository.java:144)
... 5 more
... org/apache/nutch/protocol/Protocol.java does exist, as does
org/apache/nutch/protocol/Protocol.class, jar tvf nutch-0.7.1.jar
holds the class file. I could do further investigation, but would like
some pointers as to where I should be looking first. Thanks!
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]
1. http://lucene.apache.org/nutch/tutorial.html


Re: PDF indexing support?

2005-11-16 Thread Hasan Diwan


On Nov 15, 2005, at 2:46 PM, Håvard W. Kongsgård wrote:


Don't have a conf/nutch-site.xml


Create it and put the overrides in there, per the nutch tutorial.

Cheers,
Hasan Diwan [EMAIL PROTECTED]



PGP.sig
Description: This is a digitally signed message part