Go found the crawID, it is defined in the nutch.xml file. My is TestCrwler
<property>
<name>http.agent.name</name>
<value>TestCrawler</value>
</property>
No sure what is the numberOfRounds is and where to find it.
Thanks,
Tom
On Mon, Feb 22, 2016 at 1:47 AM, Tom Running <[email protected]> wrote:
> Binoy,
>
> I do see the information on the console and also lot of information in
> hbase.
>
> I tried ./crawl but not quite sure where to location the following
> information:
>
> Usage: crawl <seedDir> <crawlID> [<solrUrl>] <numberOfRounds>
>
> seedDir ../urls/seed.txt ?
> crawID ?
> solrUrl I am guessing this will be http://localhost:8983/solr/
> numberOfRounds ?
>
> Could you provide some advice on how to determine the above information.
>
>
> Usage: crawl <seedDir> <crawlID> [<solrUrl>] <numberOfRounds>
>
> Thanks,
> Tom
>
>
>
> On Mon, Feb 22, 2016 at 1:19 AM, Binoy Dalal <[email protected]>
> wrote:
>
>> When you run the inject and generate commands, in the console output do
>> you
>> see your site being added?
>> Also while fetching and parsing you should be able to see the number of
>> successful fetches and parse actions in your console. Ideally this should
>> be equal to or more than the number of sites you've put in the seed.txt
>> file.
>> If this is not the case then there is some issue with either your seed.txt
>> file or the regex-urlfilter file.
>>
>> While running the crawl command, you doing need to index to solr
>> separately. The command will do it for you.
>> Run ./crawl to see usage instructions.
>>
>> On Mon, 22 Feb 2016, 11:41 Tom Running <[email protected]> wrote:
>>
>> > Yes, I did ran these before run ./nutch solrindex
>> > http://localhost:8983/solr/ -all and get nothing.
>> >
>> >
>> > From /home/nutch/runtime/local/bin/
>> >
>> > ./nutch inject ../urls/seed.txt
>> > ./nutch readdb
>> > ./nutch generate -topN 2500
>> > ./nutch fetch -all
>> > ./nutch parse -all
>> > ./nutch updatedb
>> >
>> > Did not run the crawl command.
>> >
>> > Would I just run ./crawl ??
>> > then run this again ./nutch solrindex http://localhost:8983/solr/ -all
>> >
>> > Thank you very much for response to my questions.
>> >
>> > Tom
>> >
>> >
>> > On Sun, Feb 21, 2016 at 11:25 PM, Binoy Dalal <[email protected]>
>> > wrote:
>> >
>> > > Just to be clear, you did run the preceding nutch commands to inject,
>> > > generate, fetch and parse the URLs right?
>> > >
>> > > Additionally try with the ./crawl command to directly crawl and index
>> > > everything to solr without having to manually run all the steps.
>> > >
>> > > On Mon, 22 Feb 2016, 07:24 Tom Running <[email protected]> wrote:
>> > >
>> > > > I am trying to get Nutch to run solrindex and having problem. I am
>> > using
>> > > > the following instruction from
>> > > > this document http://wiki.apache.org/nutch/Nutch2Tutorial.
>> Everything
>> > > > are working except when I ran the following command.
>> > > >
>> > > >
>> > > > *./nutch solrindex http://localhost:8983/solr <
>> > > http://localhost:8983/solr>
>> > > > -all*
>> > > >
>> > > >
>> > > >
>> > > > ****** it came back with the following info *****
>> > > > ****** It seems to have problem with indexing ****
>> > > > IndexingJob: starting
>> > > > Active IndexWriters :
>> > > > SOLRIndexWriter
>> > > > solr.server.url : URL of the SOLR instance (mandatory)
>> > > > solr.commit.size : buffer size when sending to SOLR (default
>> > > 1000)
>> > > > solr.mapping.file : name of the mapping file for fields
>> > (default
>> > > > solrindex-mapping.xml)
>> > > > solr.auth : use authentication* (default false)*
>> > > > solr.auth.username : username for authentication
>> > > > solr.auth.password : password for authentication
>> > > > IndexingJob: done.
>> > > >
>> > > >
>> > > > When I launch the SOLR Web UI interface can not query or find any
>> > things
>> > > > under the default collection1 or the gettingstarted_shard1_replica1
>> or
>> > > > gettingstarted_shard2_replica1
>> > > >
>> > > >
>> > > > I have also tried with this option (with the colletion1) and still
>> not
>> > > > able to query anything.
>> > > > ./nutch solrindex http://localhost:8983/solr/collection1 -all
>> > > >
>> > > >
>> > > >
>> > > > After download SOLR 4.10.3 and start it as it with command
>> > > > /home/solr/bin/solr start -e cloud -noprompt
>> > > >
>> > > > I did not modify any configuration file not posting any file or
>> > directory
>> > > > from within SOLR. I am assuming this command ./nutch solrindex
>> > > > http://localhost:8983/solr/collection1 will do all the posting and
>> > index
>> > > > for SOLR.
>> > > >
>> > > > Any ideas what am I missing here. Any advice where to go from here
>> > > would
>> > > > be greatly appreciate.
>> > > >
>> > > > I Did tried copy /nutch/runtime/local/conf/*.* into SOLR and it
>> did
>> > not
>> > > > make any different.
>> > > >
>> > > > Thank you.
>> > > >
>> > > > Tom
>> > > >
>> > > > --
>> > > Regards,
>> > > Binoy Dalal
>> > >
>> >
>> --
>> Regards,
>> Binoy Dalal
>>
>
>