The documentation is a little bit tough... :)

Really, I couldn't find a clear path for the novice from point A to point
B.  Because of this, I'm hoping this Chef Cookbook can be the tool.

Here's what I have so far:
https://github.com/raycrawford/cb_rayCrawford_nutch2

Two problems.  When I do the following, stuff gets into Solr, but it
results in:
cd /opt/nutch/runtime/local/bin
export JAVA_HOME='/etc/alternatives/jre_1.8.0'
/opt/hbase/bin/start-hbase.sh
mkdir urls
echo "http://www.bidfta.com/"; > /opt/nutch/runtime/local/bin/urls/seed.txt
/opt/nutch/runtime/local/bin/nutch inject urls/seed.txt
/opt/nutch/runtime/local/bin/crawl ./urls nutch
http://127.0.0.1:8983/solr/nutch
3


DbUpdaterJob: finished at 2017-08-16 05:01:46, time elapsed: 00:00:05

Indexing nutch on SOLR index -> http://127.0.0.1:8983/solr/nutch

/opt/nutch/runtime/local/bin/nutch index -D mapred.reduce.tasks=2 -D
mapred.child.java.opts=-Xmx1000m -D
mapred.reduce.tasks.speculative.execution=false -D
mapred.map.tasks.speculative.execution=false -D
mapred.compress.map.output=true -D solr.server.url=
http://127.0.0.1:8983/solr/nutch -all -crawlId nutch

IndexingJob: starting

Active IndexWriters :

SOLRIndexWriter

solr.server.url : URL of the SOLR instance (mandatory)

solr.commit.size : buffer size when sending to SOLR (default 1000)

solr.mapping.file : name of the mapping file for fields (default
solrindex-mapping.xml)

solr.auth : use authentication (default false)

solr.auth.username : username for authentication

solr.auth.password : password for authentication

IndexingJob: done.

SOLR dedup -> http://127.0.0.1:8983/solr/nutch

/opt/nutch/runtime/local/bin/nutch solrdedup -D mapred.reduce.tasks=2 -D
mapred.child.java.opts=-Xmx1000m -D
mapred.reduce.tasks.speculative.execution=false -D
mapred.map.tasks.speculative.execution=false -D
mapred.compress.map.output=true http://127.0.0.1:8983/solr/nutch

Exception in thread "main" java.lang.RuntimeException: job failed:
name=apache-nutch-2.3.1.jar, jobid=job_local491881398_0001

at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)

at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:383)

at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.run(SolrDeleteDuplicates.java:393)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.main(SolrDeleteDuplicates.java:403)

Error running:

  /opt/nutch/runtime/local/bin/nutch solrdedup -D mapred.reduce.tasks=2 -D
mapred.child.java.opts=-Xmx1000m -D
mapred.reduce.tasks.speculative.execution=false -D
mapred.map.tasks.speculative.execution=false -D
mapred.compress.map.output=true http://127.0.0.1:8983/solr/nutch

Failed with exit value 1.
---

Second, the site I'm indexing is essentially 3 layers deep.  The first on
has a field on it '<p class="auctionLocation">'. All other children of that
page relate to the following link, but do not have that data on them. What
I would like to do is capture the <p class="auctionLocation"> data and
relate it to all children of that block. I altered the managed schema to
include '<field name="auctionLocation" type="strings"/>', but it doesn't
seem to be adding that to the index.  Also, I don't know how to add that to
the children pages.

What I'm asking here is two parts.  I realize the first part is a
nutch2/Solr integration thing and the second is a solr thing, but hopefully
y'all can help me figure this out...

Thanks!

On Tue, Aug 15, 2017 at 10:34 AM, Sebastian Nagel <
[email protected]> wrote:

> Hi Alex,
>
> no problem. Let's be productive and work!
>
> Best,
> Sebastian
>
>
> On 08/15/2017 04:22 PM, Alejandro Caceres wrote:
> > Hey Sebastian,
> >
> > I was just giving Lewis s*** because I know him personally :P. I'm aware
> > this is an open source project and we're all in this together! No one
> likes
> > writing docs..... I should probably be working on my own docs right now.
> >
> > Alex
> >
> > On Tue, Aug 15, 2017 at 5:39 AM, Sebastian Nagel <
> [email protected]
> >> wrote:
> >
> >> Hi Alex,
> >>
> >> I would like to state that it's *your* documentation as well,
> >> as you're part of the community if following this list.
> >>
> >> If I had the time to rewrite the tutorials and documentation
> >> (and no open issues on Jira), no question, I probably would
> >> work on it. If you have spare time, you're invited to improve
> >> the documentation in any way you can. Just ask for access to
> >> the Nutch wiki.
> >>
> >> Thanks,
> >> Sebastian
> >>
> >> On 08/14/2017 09:10 PM, Alejandro Caceres wrote:
> >>> hey Lewis,
> >>>
> >>> I think he's just trying to say that your documentation sucks :D. Glad
> I
> >>> could clarify.
> >>>
> >>> Alex
> >>>
> >>> On Mon, Aug 14, 2017 at 3:03 PM, lewis john mcgibbney <
> >> [email protected]>
> >>> wrote:
> >>>
> >>>> Hi Ray,
> >>>> Apart from not being able to find a tutorial, what is wrong exactly?
> >>>> New users of Nutch are advised to use the Nutch 1.X series.
> >>>> The Nutch 2.X tutorial introduces more moving parts. This is well
> >>>> documented on this mailing list for a number of years now.
> >>>> If you can enumerate what is wrong, we will help you out.
> >>>> Thanks
> >>>> Lewis
> >>>>
> >>>> On Sun, Aug 13, 2017 at 8:49 PM, <[email protected]>
> >>>> wrote:
> >>>>
> >>>>>
> >>>>> From: Ray Crawford <[email protected]>
> >>>>> To: [email protected]
> >>>>> Cc:
> >>>>> Bcc:
> >>>>> Date: Sun, 13 Aug 2017 23:48:59 -0400
> >>>>> Subject: I'm just going to throw this out there...
> >>>>> And it may get me banned, but so be it.
> >>>>>
> >>>>> I've ben trying to get a Nutch/Solr setup running and, after many
> hours
> >>>> of
> >>>>> cruising StackOverflow, this list and many documentation sites which
> >>>> talked
> >>>>> about various versions, I've got nothing to show for it.
> >>>>>
> >>>>> Why is this so complex and why is a reasonable set of documentation
> >> about
> >>>>> how to integrate the solutions so hard to find?
> >>>>>
> >>>>> Can anyone point me to an ACCURATE Nutch 2.3/Solr tutorial?  If some
> >> one
> >>>>> can help me here, I'll write a Chef cookbook that automates the whole
> >>>>> thing.  However, I can't get any of the tutorials I've tried so far
> to
> >>>>> work.
> >>>>>
> >>>>> Thanks and hopefully the community will help me (and others) work
> >> through
> >>>>> this or absolve me of my apparent ignorance.
> >>>>>
> >>>>> - Ray.
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> http://home.apache.org/~lewismc/
> >>>> @hectorMcSpector
> >>>> http://www.linkedin.com/in/lmcgibbney
> >>>>
> >>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>

Reply via email to