Hi,

compiling from the source package is somewhat tricky. You've got 2 options to solve this:

1. export the missing config/*.template files from the SVN-Repository
2. edit build.xml:61 so that it doesn't want to copy these *.template files.

hope that helps.

cya,
        Sebastian Steinmetz


Am 29.10.2007 um 17:59 schrieb payo:


hello

I am trying install the Xml Parser but when the run ant in the step 7 and 8
showme this message

BUILD FAILED

C:\nutch-0.9\build.xml:61: Specify at least one source--a file or r
source collection.

why?




Rida Benjelloun wrote:

Hi,
Here is the steps to install the Xml Parser plugin :
1- Copy parse-xml in the src/plugin directory

2- Copy xmlparser-conf.xml in the conf directory
3- Add to nutch-site.xml (conf directory) the following property
<property>
  <name>plugin.includes</name>
  <value>protocol-http|urlfilter

-regex|parse-(text|xml|html|js)|index-basic|query-(basic|site|url)| summary-basic|scoring-opic</value>

  <description>Regular expression naming plugin directory names to
  include.  Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpoints plugin.
By

  default Nutch includes crawling just HTML and plain text via HTTP,
  and basic indexing and search plugins.
  </description>
</property>

4- Modify parse-plugins.xml (conf directory)
    <mimeType name="text/xml">
        <plugin id="parse-xml" />
        <plugin id="parse-text" />
        <plugin id="parse-html" />
        <plugin id="parse-rss" />
    </mimeType>

5- Modify build.xml in the root directory add parse-xml
6 - Modify src\plugin build.xml add parse-xml
7 - Execute ant in src/plugin directory
8 - Execute  ant in the root directory
9 - Copy parse-xml directory located in nutch-0.8.1/build/plugins to
nutch-0.8.1/plugins

Best regards

Rida Benjelloun




On 11/7/06, Jim Wilson <[EMAIL PROTECTED]> wrote:

I think you should stop sending *bump* emails.

-- Jim

On 11/7/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote:

*bump*

Any thoughts, anyone?

Thanks,
Jayant

On 11/6/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote:
Hello,

I have been working on it since then.. I have found one problem. It
seems the plugin parse-xml plugin is not loading.

One thing I did was put the plugin in the parse-plugins.xml to enable nutch-0.8.1 to detect that parse-xml is the plugin to be used for xml content. This is not given in the instructions for the plugin though.

Because of it I started to get the following error in hadoop.log:-

2006-11-06 15:12:33,156 WARN  parse.ParserFactory - ParserFactory:
Plugin: parse-xml mapped to contentType text/xml via
parse-plugins.xml, but not enabled via plugin.includes in
nutch-default.xml

The issue is that I have the plugin enabled in the nutch- site.xml. I also tried to enable the plugin in nutch-default.xml but I still get
the same error.

Any thoughts/ pointers on how to make the plugin work?

Thanks and Best Regards,
Jayant Gandhi


On 11/5/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote:
I am using the default xmlparser-conf.xml, just copied it into
nutch/conf dir. To test it I used the xml file given in the sample
directory xmltest.xml and is uploaded at
http://www.jkg.in/xmltest.xml
.

I do not get any errors while indexing or parsing. The crawl log is
attached. I am able to get the xml file in the results when I
search
for 'XPath' but when I click the explain link, it doesn't show me
the
field dctitle in the index which it should.

I just noticed that hadoop.log has some error for handling xml
files
and I cannot see parse-xml loaded, but I have it enabled in my
nutch-site.conf. I am new to nutch-0.8 and hadoop so I have no idea
whether this is expected behaviour/ how to fix it.

Thanks and Best Regards,
Jayant

On 11/5/06, Nutch Newbie <[EMAIL PROTECTED]> wrote:
Can you post your "xmlparser-conf.xml" from the nutch/conf dir ?
Also what kind of error message do you get when you index?
You can use Luke to see the index...

Regards,

On 11/4/06, Jayant Kumar Gandhi <[EMAIL PROTECTED]> wrote:
Hello Everyone,

I am just installed nutch-0.8.1 on my dev machine. I installed
a
new
plugin called XML Parser available at
http://issues.apache.org/jira/browse/NUTCH-185
The issue is that I am unable get it to work.
I copied the parse-xml folder to src/plugin folder. I made the
corresponding deploy/ clean entries in the build xml file.

Also, I have editied the nutch conf to enable xml plugin.
The plugin is still not working. After compiling using ant, I
started
indexing. After the indexing was finished and query done, I
couldnt
see the indexed fields on the explain page.

Any inputs?

Thanks,
Jayant



--
www.jkg.in | http://www.jkg.in/contact-me/
Jayant Kr. Gandhi

--
www.jkg.in | http://www.jkg.in/contact-me/
Jayant Kr. Gandhi



--
www.jkg.in | http://www.jkg.in/contact-me/
Jayant Kr. Gandhi
M.Tech. Computer Tech. Class of 2007,
IIT Delhi






--
View this message in context: http://www.nabble.com/XMLParser-for- Nutch-tf2575183.html#a13471028
Sent from the Nutch - User mailing list archive at Nabble.com.


Reply via email to