I was afraid of this :-( I can't believe that no one has tried this
configuration yet?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Nutch-1-3-Cygwin-hadoop-paths-tp3336911p3348154.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Hi,
As you probably know, there are not very many active windows + Nutch users
on this list. This leaves you in a bit of a catch 22. When I first started
using Nutch it was on a windows desktop and I found it pretty painful at
times. Most of the relevant documentation available caters for *nix
Hi
I tried to run nutch-1.3 together with solr 3.x according to
http://wiki.apache.org/nutch/NutchTutorial.
That worked as described but if I try to search the index using the Solr
admin
interface i always get an empty result.
http://localhost:8983/solr/admin/schema.jsp
Using the Schema
Check line 79 of your Solr schema:
http://svn.apache.org/viewvc/nutch/branches/branch-1.3/conf/schema.xml?view=markup
Maybe we should configure the field to be stored in 1.4. I can imagine this
causes a lot of headaches for new users. Also highlighting will never work
with unstored fields.
On
On Monday 19 September 2011 15:58:35 lewis john mcgibbney wrote:
Yes, what Markus has pointed out is the problem I think Jann. This means
you need to re-index you're data and change the stored and index value to
true.
Markus', out of interest do you know the pro's/con's if we were to make
*previous sent by accident
On Monday 19 September 2011 15:58:35 lewis john mcgibbney wrote:
Yes, what Markus has pointed out is the problem I think Jann. This means
you need to re-index you're data and change the stored and index value to
true.
Markus', out of interest do you know the
Does this solve you're problem Jann?
Is this worth filing an issue for as it is rather trivial to address but
could help more users unfamiliar with specifics of Nutch (or Solr) Schema(s)
On Mon, Sep 19, 2011 at 3:06 PM, Markus Jelsma
markus.jel...@openindex.iowrote:
*previous sent by accident
Hi Julien,
Thanks, that's encouraging. I'm trying to make this work, and I'm definitely
missing something. I hope I'm not too far off the mark. I've started with the
instructions at http://wiki.apache.org/nutch/WritingPluginExample . If I
understand this properly, the changes I needed to make
Hi Chip,
There is no need to run ant war, there is no war target in the = Nutch 1.3
build.xml file.
Can you explian more about adding 'the tags to %NUTCH_HOME% etc etc. Do you
mean you've added your seed URLs?
Have you had a look at any of your log output as to whether the urlmeta
plugin is
Hi Lewis,
My probably wrong understanding was that I'm supposed to add the tags for my
new field to my list of seed URLs. So if I have a seed URL followed by
\t humanURL=http://www.aip.org/history/ead/20110369.html;, I get a new field
called humanURL which is populated with the string
Hi
Since the info is available thanks to the injection you can use the url-meta
plugin as-is and won't need to have a custom version. See
https://issues.apache.org/jira/browse/NUTCH-855
Apart from that do not modify the content of \runtime\local\conf\ before
re-compiling with ANT as this will
Hi,
I sometimes come across relative outlinks in the source that are intended as
absolute but where the webmaster or CMS omits the protocol scheme. This
results in repeating URI segments and crap URL's.
Would an option that treat such URL's as absolute be a good idea? This problem
is similar
In addition, it looks like you are misinterpreting how the urlmeta plugin
works Chip. It is designed to pick up addition meta tags with name and a
content values respectively. e.g.
meta name=humanURL content=blahblahblah
The plugin then gets this data as well as any additional values added in
On Sep 19, 2011, at 1:52pm, Markus Jelsma wrote:
Hi,
I sometimes come across relative outlinks in the source that are intended
as absolute but where the webmaster or CMS omits the protocol scheme.
This results in repeating URI segments and crap URL's.
Would an option that treat
I thought it seemed too good to be true. I understood the part about this
picking up metadata from tags within the actual documents; that seems like a
feature a lot of people would need. But I thought the whole point of the
tab-delimited tags in my URLs file was that I could also inject tags
In addition, it looks like you are misinterpreting how the urlmeta plugin
works Chip. It is designed to pick up addition meta tags with name and a
content values respectively. e.g.
meta name=humanURL content=blahblahblah
Sorry Lewis but it does not do that at all. See link I gave earlier
16 matches
Mail list logo