H^i Chip,

Was there anything in particular you found misleading about the plugin
example on the wiki? I am keen to make it as clear as possible.

Thank you

Lewis

On Tue, Sep 20, 2011 at 6:00 PM, Chip Calhoun <ccalh...@aip.org> wrote:

> Hi Julien,
>
> Thanks for clarifying this! I've got it working now. Instead of seeding
> with a proper tab-delimited file created in Excel, I had been wrong-headedly
> seeding it with a text file that just had tabs in it. They look the same,
> but it makes a difference. Thanks!
>
> Chip
>
> -----Original Message-----
> From: Julien Nioche [mailto:lists.digitalpeb...@gmail.com]
> Sent: Monday, September 19, 2011 5:23 PM
> To: user@nutch.apache.org
> Subject: Re: Machine readable vs. human readable URLs.
>
> > In addition, it looks like you are misinterpreting how the urlmeta
> > plugin works Chip. It is designed to pick up addition meta tags with
> > name and a content values respectively. e.g.
> >
> > <meta name="humanURL" content="blahblahblah">
> >
>
> Sorry Lewis but it does not do that at all. See link I gave earlier for a
> description of urlmeta. I agree that the name is misleading, it does not
> extra the content from the page but simply uses the crawldb metadata
>
>
> >
> > The plugin then gets this data as well as any additional values added
> > in the urlmeta.tags property within nutch-site.xml and add this to the
> > index which can then be queried.
> >
> > Does this make sense?
> >
> > On Mon, Sep 19, 2011 at 9:10 PM, Julien Nioche <
> > lists.digitalpeb...@gmail.com> wrote:
> >
> > > Hi
> > >
> > > Since the info is available thanks to the injection you can use the
> > > url-meta plugin as-is and won't need to have a custom version.  See
> > > https://issues.apache.org/jira/browse/NUTCH-855
> > >
> > > Apart from that do not modify the content of  \runtime\local\conf\
> > > before re-compiling with ANT as this will be overwritten. Either
> > > modify $NUTCH/conf/nutch-site.xml or recompile THEN modify.
> > >
> > > As Lewis suggested check the logs and see if the plugin is activated
> > etc...
> > >
> > > J.
> > >
> > >
> > > On 19 September 2011 21:03, Chip Calhoun <ccalh...@aip.org> wrote:
> > >
> > > > Hi Lewis,
> > > >
> > > > My probably wrong understanding was that I'm supposed to add the
> > > > tags
> > for
> > > > my new field to my list of seed URLs. So if I have a seed URL
> > > > followed
> > by
> > > "
> > > >        \t humanURL=http://www.aip.org/history/ead/20110369.html";,
> > > > I
> > get
> > > a
> > > > new field called "humanURL" which is populated with the string
> > > > I've specified for that specific URL. I may just be greatly
> > > > misunderstanding
> > > how
> > > > this plugin works.
> > > >
> > > > I've checked my Nutch logs now and it looks like nothing happened.
> > > > The
> > > new
> > > > field does at least show up in the Solr admin UI's schema, but
> > > > clearly
> > my
> > > > problem is on the Nutch end of things.
> > > >
> > > > -----Original Message-----
> > > > From: lewis john mcgibbney [mailto:lewis.mcgibb...@gmail.com]
> > > > Sent: Monday, September 19, 2011 3:34 PM
> > > > To: user@nutch.apache.org
> > > > Subject: Re: Machine readable vs. human readable URLs.
> > > >
> > > > Hi Chip,
> > > >
> > > > There is no need to run ant war, there is no war target in the >=
> > > > Nutch
> > > 1.3
> > > > build.xml file.
> > > >
> > > > Can you explian more about adding 'the tags to %NUTCH_HOME% etc
> > > > etc. Do
> > > you
> > > > mean you've added your seed URLs?
> > > >
> > > > Have you had a look at any of your log output as to whether the
> > > > urlmeta plugin is loaded and used when fetching?
> > > >
> > > > You should be able to get info on your schema, fields etc within
> > > > the
> > Solr
> > > > admin UI
> > > >
> > > > On Mon, Sep 19, 2011 at 8:09 PM, Chip Calhoun <ccalh...@aip.org>
> > wrote:
> > > >
> > > > > Hi Julien,
> > > > >
> > > > > Thanks, that's encouraging. I'm trying to make this work, and
> > > > > I'm definitely missing something. I hope I'm not too far off the
> mark.
> > > > > I've started with the instructions at
> > > > > http://wiki.apache.org/nutch/WritingPluginExample . If I
> > > > > understand this properly, the changes I needed to make were the
> following:
> > > > >
> > > > > In Nutch:
> > > > > Paste the prescribed block of code into
> > > > > %NUTCH_HOME%\runtime\local\conf\nutch-site.xml. This tells Nutch
> > > > > to look for and run the urlmeta plugin.
> > > > > In %NUTCH_HOME%, run "ant war".
> > > > > Add the tags to %NUTCH_HOME% \runtime\local\urls\nutch. A line
> > > > > in
> > this
> > > > file
> > > > > now looks like: "http://www.aip.org/history/ead/20110369.xml
> >  \t
> > > > > humanURL=http://www.aip.org/history/ead/20110369.html";
> > > > >
> > > > > In Solr:
> > > > > Added my new tag to %SOLR_HOME%\example\solr\conf\schema.xml .
> > > > > The
> > new
> > > > > line consists of: " <field name="humanURL" type="string"
> > stored="true"
> > > > > indexed="false"/>"
> > > > >
> > > > > I've redone the indexing, and my new field still doesn't show up
> > > > > in the search results. Can you tell where I'm going wrong?
> > > > >
> > > > > Thanks,
> > > > > Chip
> > > > >
> > > > > -----Original Message-----
> > > > > From: Julien Nioche [mailto:lists.digitalpeb...@gmail.com]
> > > > > Sent: Friday, September 16, 2011 4:37 AM
> > > > > To: user@nutch.apache.org
> > > > > Subject: Re: Machine readable vs. human readable URLs.
> > > > >
> > > > > Hi Chip,
> > > > >
> > > > > Should simply be a matter of creating a custom field with an
> > > > > IndexingFilter, you can then use it in any way you want on the
> > > > > SOLR side
> > > > >
> > > > > Julien
> > > > >
> > > > > On 15 September 2011 21:50, Chip Calhoun <ccalh...@aip.org> wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > We'd like to use Nutch and Solr to replace an existing Verity
> > search
> > > > > > that's become a bit long in the tooth. In our Verity search,
> > > > > > we
> > have
> > > > > > a hack which allows each document to have a machine-readable
> > > > > > URL which is indexed (generally an xml document), and a
> > > > > > human-readable URL which we actually send users to. Has anyone
> > > > > > done the same with
> > > > Nutch and Solr?
> > > > > >
> > > > > > Thanks,
> > > > > > Chip
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *
> > > > > *Open Source Solutions for Text Engineering
> > > > >
> > > > > http://digitalpebble.blogspot.com/
> > > > > http://www.digitalpebble.com
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > *Lewis*
> > > >
> > >
> > >
> > >
> > > --
> > > *
> > > *Open Source Solutions for Text Engineering
> > >
> > > http://digitalpebble.blogspot.com/
> > > http://www.digitalpebble.com
> > >
> >
> >
> >
> > --
> > *Lewis*
> >
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>



-- 
*Lewis*

Reply via email to