Re: Error when trying to create a core in solr

2020-06-09 Thread Jim Anderson
Hi Erick,

I probably should have included information about the config directory. As
part of the setup, I had copied the config directory as follows:

$ cp -r /usr/share/solr-8.5.1/server/solr/configsets/_default/* .

Note that the copy was from solr-8.5.1 because I could not find a
'_default' directory in solr-7.3.1.  Coping from 8.5.1 may well be my
problem.
I will check and see if I can find a 7.3.1 example directory to copy from.
I will report back.

Regards,
Jim

On Tue, Jun 9, 2020 at 10:22 AM Erick Erickson 
wrote:

> You need the entire config directory for a start, not just the schema file.
>
> And there’s no need to copy things around, just path to the nutch-provided
> config directory and you can leave off the “conf” since the upload process
> automatically checks for it and does the right thing.
>
> Best,
> Erick
>
> > On Jun 9, 2020, at 9:50 AM, Jim Anderson 
> wrote:
> >
> > Hi,
> >
> > I am running Solr-7.3.1. I have just untarred the Solr-7.3.1 area and
> > created a 'nutch' directory for the core. I have downloaded
> > nutch-master.zip from
> > https://github.com/apache/nutch, unzipped that file and copied
> schema.xml
> > to .../server/solr/configsets/nutch/conf/schema.xml
> >
> > In the schema file, I modified the lastModified file value to true, with
> no
> > other changes.
> >
> > I am running the following command:
> >
> > .../bin/solr create -c nutch -d .../server/solr/configsets/nutch/conf/
> >
> > and getting the error message:
> >
> > ERROR: Error CREATEing SolrCore 'nutch': Unable to create core [nutch]
> > Caused by: Illegal pattern component: pp
> >
> > I have done a search for an error message containing: "Illegal pattern
> > component: pp" but I did not find anything useful.
> >
> > Can anyone help explain what this error message means and/or what needs
> to
> > be done to fix this problem?
> >
> > Jim A.
>
>


Error when trying to create a core in solr

2020-06-09 Thread Jim Anderson
Hi,

I am running Solr-7.3.1. I have just untarred the Solr-7.3.1 area and
created a 'nutch' directory for the core. I have downloaded
nutch-master.zip from
https://github.com/apache/nutch, unzipped that file and copied schema.xml
to .../server/solr/configsets/nutch/conf/schema.xml

In the schema file, I modified the lastModified file value to true, with no
other changes.

I am running the following command:

.../bin/solr create -c nutch -d .../server/solr/configsets/nutch/conf/

and getting the error message:

ERROR: Error CREATEing SolrCore 'nutch': Unable to create core [nutch]
Caused by: Illegal pattern component: pp

I have done a search for an error message containing: "Illegal pattern
component: pp" but I did not find anything useful.

Can anyone help explain what this error message means and/or what needs to
be done to fix this problem?

Jim A.


Re: Solr admin error message - where are relevant log files?

2020-06-07 Thread Jim Anderson
I cleared the Firefox cache and restarted and things are working ok now.

Jim

On Sun, Jun 7, 2020 at 3:44 PM Jim Anderson 
wrote:

> @Jan
>
> Thanks for the suggestion. I tried opera instead of firefox and it worked.
> I will try cleaner the cache on firefox, restart it and see if it works
> there.
>
> Jim
>
> On Sun, Jun 7, 2020 at 3:28 PM Jim Anderson 
> wrote:
>
>> An update.
>>
>> I started over by removing my Solr 7.3.1 installation and untarring again.
>>
>> Then went to the bin root directory and entered:
>>
>> bin/solr -start
>>
>> Next, I brought up the solr admin window and it still gives the same
>> error message and hangs up. As far as I can tell I am running solr straight
>> out of the box.
>>
>> Jim
>>
>> On Sun, Jun 7, 2020 at 3:07 PM Jim Anderson 
>> wrote:
>>
>>> >>> Did you install Solr with the installer script
>>>
>>> I was not aware that there is an install script. I will look for it, but
>>> if you can point me to it, that will help
>>>
>>> >>> or just
>>> >>> start it up after extracting the archive?
>>>
>>> I extracted the files from a tar ball and did a bit of setting up. For
>>> example, I created a core and modified my schema.xml file a bit.
>>>
>>> >> Does the solr/server/logs
>>> >> directory you mentioned contain files with timestamps that are
>>> current?
>>>
>>> The log files were current.
>>>
>>> >>> If you go to the "Logging" tab when the admin UI shows that error
>>>
>>> I cannot go to the "Logging" tab. When the admin UI comes up, it shows
>>> the error message and hangs with the cursor spinning.
>>>
>>> Thanks for the input. Again, if you can provide the install script, that
>>> will likely help. I'm going to go back and start with installing Solr again.
>>>
>>> Jim
>>>
>>>
>>>
>>> On Sun, Jun 7, 2020 at 1:09 PM Shawn Heisey  wrote:
>>>
>>>> On 6/7/2020 10:16 AM, Jim Anderson wrote:
>>>> > The admin pages comes up with:
>>>> >
>>>> > SolrCore Initialization Failures
>>>>
>>>> 
>>>>
>>>> > I look in my .../solr/server/logs directory and cannot find and
>>>> meaningful
>>>> > errors or warnings.
>>>> >
>>>> > Should I be looking elsewhere?
>>>>
>>>> That depends.  Did you install Solr with the installer script, or just
>>>> start it up after extracting the archive?  Does the solr/server/logs
>>>> directory you mentioned contain files with timestamps that are current?
>>>> If not, then the logs are likely going somewhere else.
>>>>
>>>> If you go to the "Logging" tab when the admin UI shows that error, you
>>>> will be able to see any log messages at WARN or higher severity.  Often
>>>> such log entries will need to be expanded by clicking on the little "i"
>>>> icon.  It will close again quickly, so you need to read fast.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>>


Re: Solr admin error message - where are relevant log files?

2020-06-07 Thread Jim Anderson
@Jan

Thanks for the suggestion. I tried opera instead of firefox and it worked.
I will try cleaner the cache on firefox, restart it and see if it works
there.

Jim

On Sun, Jun 7, 2020 at 3:28 PM Jim Anderson 
wrote:

> An update.
>
> I started over by removing my Solr 7.3.1 installation and untarring again.
>
> Then went to the bin root directory and entered:
>
> bin/solr -start
>
> Next, I brought up the solr admin window and it still gives the same error
> message and hangs up. As far as I can tell I am running solr straight out
> of the box.
>
> Jim
>
> On Sun, Jun 7, 2020 at 3:07 PM Jim Anderson 
> wrote:
>
>> >>> Did you install Solr with the installer script
>>
>> I was not aware that there is an install script. I will look for it, but
>> if you can point me to it, that will help
>>
>> >>> or just
>> >>> start it up after extracting the archive?
>>
>> I extracted the files from a tar ball and did a bit of setting up. For
>> example, I created a core and modified my schema.xml file a bit.
>>
>> >> Does the solr/server/logs
>> >> directory you mentioned contain files with timestamps that are
>> current?
>>
>> The log files were current.
>>
>> >>> If you go to the "Logging" tab when the admin UI shows that error
>>
>> I cannot go to the "Logging" tab. When the admin UI comes up, it shows
>> the error message and hangs with the cursor spinning.
>>
>> Thanks for the input. Again, if you can provide the install script, that
>> will likely help. I'm going to go back and start with installing Solr again.
>>
>> Jim
>>
>>
>>
>> On Sun, Jun 7, 2020 at 1:09 PM Shawn Heisey  wrote:
>>
>>> On 6/7/2020 10:16 AM, Jim Anderson wrote:
>>> > The admin pages comes up with:
>>> >
>>> > SolrCore Initialization Failures
>>>
>>> 
>>>
>>> > I look in my .../solr/server/logs directory and cannot find and
>>> meaningful
>>> > errors or warnings.
>>> >
>>> > Should I be looking elsewhere?
>>>
>>> That depends.  Did you install Solr with the installer script, or just
>>> start it up after extracting the archive?  Does the solr/server/logs
>>> directory you mentioned contain files with timestamps that are current?
>>> If not, then the logs are likely going somewhere else.
>>>
>>> If you go to the "Logging" tab when the admin UI shows that error, you
>>> will be able to see any log messages at WARN or higher severity.  Often
>>> such log entries will need to be expanded by clicking on the little "i"
>>> icon.  It will close again quickly, so you need to read fast.
>>>
>>> Thanks,
>>> Shawn
>>>
>>


Re: Solr admin error message - where are relevant log files?

2020-06-07 Thread Jim Anderson
An update.

I started over by removing my Solr 7.3.1 installation and untarring again.

Then went to the bin root directory and entered:

bin/solr -start

Next, I brought up the solr admin window and it still gives the same error
message and hangs up. As far as I can tell I am running solr straight out
of the box.

Jim

On Sun, Jun 7, 2020 at 3:07 PM Jim Anderson 
wrote:

> >>> Did you install Solr with the installer script
>
> I was not aware that there is an install script. I will look for it, but
> if you can point me to it, that will help
>
> >>> or just
> >>> start it up after extracting the archive?
>
> I extracted the files from a tar ball and did a bit of setting up. For
> example, I created a core and modified my schema.xml file a bit.
>
> >> Does the solr/server/logs
> >> directory you mentioned contain files with timestamps that are current?
>
> The log files were current.
>
> >>> If you go to the "Logging" tab when the admin UI shows that error
>
> I cannot go to the "Logging" tab. When the admin UI comes up, it shows the
> error message and hangs with the cursor spinning.
>
> Thanks for the input. Again, if you can provide the install script, that
> will likely help. I'm going to go back and start with installing Solr again.
>
> Jim
>
>
>
> On Sun, Jun 7, 2020 at 1:09 PM Shawn Heisey  wrote:
>
>> On 6/7/2020 10:16 AM, Jim Anderson wrote:
>> > The admin pages comes up with:
>> >
>> > SolrCore Initialization Failures
>>
>> 
>>
>> > I look in my .../solr/server/logs directory and cannot find and
>> meaningful
>> > errors or warnings.
>> >
>> > Should I be looking elsewhere?
>>
>> That depends.  Did you install Solr with the installer script, or just
>> start it up after extracting the archive?  Does the solr/server/logs
>> directory you mentioned contain files with timestamps that are current?
>> If not, then the logs are likely going somewhere else.
>>
>> If you go to the "Logging" tab when the admin UI shows that error, you
>> will be able to see any log messages at WARN or higher severity.  Often
>> such log entries will need to be expanded by clicking on the little "i"
>> icon.  It will close again quickly, so you need to read fast.
>>
>> Thanks,
>> Shawn
>>
>


Re: Solr admin error message - where are relevant log files?

2020-06-07 Thread Jim Anderson
 >>> Did you install Solr with the installer script

I was not aware that there is an install script. I will look for it, but if
you can point me to it, that will help

>>> or just
>>> start it up after extracting the archive?

I extracted the files from a tar ball and did a bit of setting up. For
example, I created a core and modified my schema.xml file a bit.

>> Does the solr/server/logs
>> directory you mentioned contain files with timestamps that are current?

The log files were current.

>>> If you go to the "Logging" tab when the admin UI shows that error

I cannot go to the "Logging" tab. When the admin UI comes up, it shows the
error message and hangs with the cursor spinning.

Thanks for the input. Again, if you can provide the install script, that
will likely help. I'm going to go back and start with installing Solr again.

Jim



On Sun, Jun 7, 2020 at 1:09 PM Shawn Heisey  wrote:

> On 6/7/2020 10:16 AM, Jim Anderson wrote:
> > The admin pages comes up with:
> >
> > SolrCore Initialization Failures
>
> 
>
> > I look in my .../solr/server/logs directory and cannot find and
> meaningful
> > errors or warnings.
> >
> > Should I be looking elsewhere?
>
> That depends.  Did you install Solr with the installer script, or just
> start it up after extracting the archive?  Does the solr/server/logs
> directory you mentioned contain files with timestamps that are current?
> If not, then the logs are likely going somewhere else.
>
> If you go to the "Logging" tab when the admin UI shows that error, you
> will be able to see any log messages at WARN or higher severity.  Often
> such log entries will need to be expanded by clicking on the little "i"
> icon.  It will close again quickly, so you need to read fast.
>
> Thanks,
> Shawn
>


Solr admin error message - where are relevant log files?

2020-06-07 Thread Jim Anderson
Hi,

I'm a newbie with Solr, and going through tutorials and trying to get Solr
working with Nutch.

Today, I started up Solr and then brought up Solr Admin at:

http://localhost:8983/solr/

The admin pages comes up with:

SolrCore Initialization Failures

   - *{{core}}:* {{error}}

Please check your logs for more information


I look in my .../solr/server/logs directory and cannot find and meaningful
errors or warnings.


Should I be looking elsewhere?

Jim A.


Re: SolrClient.query take a 'collection' argument

2020-06-06 Thread Jim Anderson
Erick,

Thanks for the clarification on the JVM heap space. I will invoke java as
you advise.

The program that I am writing is a java example that I took off the
internet. The intent of the example is to read an existing core stored in
solr. I created the core using instructions that I found in a tutorial. I
think the example from the tutorial worked ok, because I can see the core
in solr that was created using nutch. So I think my status is that I have a
good core, and I was trying to read and print out the documents in that
core.

My current plan is to try to find and intall Nutch 1.17 and then clear and
reinstall solr 8.5.1 and start over again with a clean slate.

Regards,
Jim


On Sat, Jun 6, 2020 at 10:25 AM Erick Erickson 
wrote:

> I’m not talking about how much memory your machine has,
> the critical bit it’s how much heap space is allocated to the
> JVM to run your app.
>
> You can increase it by specifying -Xmx2G say when you
> invoke Java.
>
> The version difference is suspicious indeed. I’m a little
> confused here. Exactly _what_ program is crashing? An
> independent app you wrote or nutch? If the former, you could
> try compiling your Java app against the Solr jars provided
> with the Solr version that ships with Nutch 1.16 (Solr 7.3.1?).
>
> Best,
> Erick
>
> > On Jun 6, 2020, at 9:30 AM, Jim Anderson 
> wrote:
> >
> > Erick,
> >
> > Thanks for the suggestion. I will keep it in the back of my mind for now.
> > My PC has 8 G-bytes of memory and has roughly 4 G-bytes in use.
> >
> > If the forefront, I'm looking at the recommended solr/nutch combinations.
> > I'm using Solr 8.5.1 with nutch 1.16. The recommendation is to use nutch
> > 1.17 with Solr 8.5.1, but 1.17 has not been released for download.
> > Consequently, I used nutch 1.16. I'm not sure that will make a
> difference,
> > but I am suspicious.
> >
> > Jim
> >
> > On Sat, Jun 6, 2020 at 9:18 AM Erick Erickson 
> > wrote:
> >
> >> I’d look for an OutOfMemory problem before going too much farther.
> >> The simplest way to see if that’s in the right direction would be to
> >> run your SolrJ program with a massive memory size. Perhaps monitor
> >> your program with jconsole or similar to see if there’s any clues about
> >> memory usage.
> >>
> >> OOMs lead to unpredictable behavior, so it’s at least a possibility that
> >> this is the root cause. If so, there’s nothing SolrJ can do about it
> >> exactly
> >> because the state of a program is indeterminate afterwards, even if the
> >> OOM is caught somewhere. I suppose you could also try to catch that
> >> exception in the top-level of your program.
> >>
> >> I’m assuming a stand-alone program here, if you’re running some custom
> >> code in Solr itself, make sure the oom-killer script is running.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Jun 6, 2020, at 8:23 AM, Jim Anderson 
> >> wrote:
> >>>
> >>> Shawn,
> >>>
> >>> Thanks for the explanation. Very good response.
> >>>
> >>> The first paragraph helped clarify what a collection is. I have read
> >> quite
> >>> about about Solr. There is so much to absorb that it is slowly sinking
> >> in.
> >>> Your 2nd paragraph definitely answered my question, i.e. passing a core
> >>> name should be ok when a collection name is specified as a method
> >> argument.
> >>> This is what I did.
> >>>
> >>> Regarding the 3rd paragraph, it is good to know that Solrj is fairly
> >> robust
> >>> and should not be crashing. Nevertheless, that is what is happening.
> The
> >>> call to client.query() is wrapped in a try/catch sequence. Apparently
> no
> >>> exceptions were detected, or the program crashed before the exception
> >> could
> >>> be raised.
> >>>
> >>> My next step is to check where I can report this to the Solr folks and
> >> see
> >>> if they can figure out what it is crashing. BTW, I had not checked my
> >>> output file before this morning. The output file indicates that the
> >> program
> >>> ran to completion, so I am guessing that at least one other thread is
> >> being
> >>> created and that that  thread is crashing.
> >>>
> >>> Regards,
> >>> Jim
> >>>
> >>> On Fri, Jun 5, 2020 at 10:52 PM Shawn Heisey 
> >> wrote:
> >>>
> >>>> On 6/5/2020 4:24 PM

Re: SolrClient.query take a 'collection' argument

2020-06-06 Thread Jim Anderson
Erick,

Thanks for the suggestion. I will keep it in the back of my mind for now.
My PC has 8 G-bytes of memory and has roughly 4 G-bytes in use.

If the forefront, I'm looking at the recommended solr/nutch combinations.
I'm using Solr 8.5.1 with nutch 1.16. The recommendation is to use nutch
1.17 with Solr 8.5.1, but 1.17 has not been released for download.
Consequently, I used nutch 1.16. I'm not sure that will make a difference,
but I am suspicious.

Jim

On Sat, Jun 6, 2020 at 9:18 AM Erick Erickson 
wrote:

> I’d look for an OutOfMemory problem before going too much farther.
> The simplest way to see if that’s in the right direction would be to
> run your SolrJ program with a massive memory size. Perhaps monitor
> your program with jconsole or similar to see if there’s any clues about
> memory usage.
>
> OOMs lead to unpredictable behavior, so it’s at least a possibility that
> this is the root cause. If so, there’s nothing SolrJ can do about it
> exactly
> because the state of a program is indeterminate afterwards, even if the
> OOM is caught somewhere. I suppose you could also try to catch that
> exception in the top-level of your program.
>
> I’m assuming a stand-alone program here, if you’re running some custom
> code in Solr itself, make sure the oom-killer script is running.
>
> Best,
> Erick
>
> > On Jun 6, 2020, at 8:23 AM, Jim Anderson 
> wrote:
> >
> > Shawn,
> >
> > Thanks for the explanation. Very good response.
> >
> > The first paragraph helped clarify what a collection is. I have read
> quite
> > about about Solr. There is so much to absorb that it is slowly sinking
> in.
> > Your 2nd paragraph definitely answered my question, i.e. passing a core
> > name should be ok when a collection name is specified as a method
> argument.
> > This is what I did.
> >
> > Regarding the 3rd paragraph, it is good to know that Solrj is fairly
> robust
> > and should not be crashing. Nevertheless, that is what is happening. The
> > call to client.query() is wrapped in a try/catch sequence. Apparently no
> > exceptions were detected, or the program crashed before the exception
> could
> > be raised.
> >
> > My next step is to check where I can report this to the Solr folks and
> see
> > if they can figure out what it is crashing. BTW, I had not checked my
> > output file before this morning. The output file indicates that the
> program
> > ran to completion, so I am guessing that at least one other thread is
> being
> > created and that that  thread is crashing.
> >
> > Regards,
> > Jim
> >
> > On Fri, Jun 5, 2020 at 10:52 PM Shawn Heisey 
> wrote:
> >
> >> On 6/5/2020 4:24 PM, Jim Anderson wrote:
> >>> I am running my first solrj program and it is crashing when I call the
> >>> method
> >>>
> >>> client.query("coreName",queryParms)
> >>>
> >>> The API doc says the string should be a collection. I'm still not sure
> >>> about the difference between a collection and a core, so what I am
> doing
> >> is
> >>> likely illegal. Given that I have created a core, create a collection
> >> from
> >>> it so that I can truly pass a collection name to the query function?
> >>
> >> The concept of a collection comes from SolrCloud.  A collection is made
> >> up of one or more shards.  A shard is made up of one or more replicas.
> >> Each replica is a core.  If you're not running SolrCloud, then you do
> >> not have collections.
> >>
> >> Wherever SolrJ docs says "collection" as a parameter for a request, it
> >> is likely that you can think "core" instead and have it still be
> >> correct.  If you're running SolrCloud, you'll want to be very careful to
> >> know the difference.
> >>
> >> It seems very odd for a SolrJ query to cause the program to crash.  It
> >> would be pretty common for it to throw an exception, but that's not the
> >> same as a crash, unless exception handling is incorrect or missing.
> >>
> >> Thanks,
> >> Shawn
> >>
>
>


Re: SolrClient.query take a 'collection' argument

2020-06-06 Thread Jim Anderson
Shawn,

Thanks for the explanation. Very good response.

The first paragraph helped clarify what a collection is. I have read quite
about about Solr. There is so much to absorb that it is slowly sinking in.
Your 2nd paragraph definitely answered my question, i.e. passing a core
name should be ok when a collection name is specified as a method argument.
This is what I did.

Regarding the 3rd paragraph, it is good to know that Solrj is fairly robust
and should not be crashing. Nevertheless, that is what is happening. The
call to client.query() is wrapped in a try/catch sequence. Apparently no
exceptions were detected, or the program crashed before the exception could
be raised.

My next step is to check where I can report this to the Solr folks and see
if they can figure out what it is crashing. BTW, I had not checked my
output file before this morning. The output file indicates that the program
ran to completion, so I am guessing that at least one other thread is being
created and that that  thread is crashing.

Regards,
Jim

On Fri, Jun 5, 2020 at 10:52 PM Shawn Heisey  wrote:

> On 6/5/2020 4:24 PM, Jim Anderson wrote:
> > I am running my first solrj program and it is crashing when I call the
> > method
> >
> > client.query("coreName",queryParms)
> >
> > The API doc says the string should be a collection. I'm still not sure
> > about the difference between a collection and a core, so what I am doing
> is
> > likely illegal. Given that I have created a core, create a collection
> from
> > it so that I can truly pass a collection name to the query function?
>
> The concept of a collection comes from SolrCloud.  A collection is made
> up of one or more shards.  A shard is made up of one or more replicas.
> Each replica is a core.  If you're not running SolrCloud, then you do
> not have collections.
>
> Wherever SolrJ docs says "collection" as a parameter for a request, it
> is likely that you can think "core" instead and have it still be
> correct.  If you're running SolrCloud, you'll want to be very careful to
> know the difference.
>
> It seems very odd for a SolrJ query to cause the program to crash.  It
> would be pretty common for it to throw an exception, but that's not the
> same as a crash, unless exception handling is incorrect or missing.
>
> Thanks,
> Shawn
>


SolrClient.query take a 'collection' argument

2020-06-05 Thread Jim Anderson
I am running my first solrj program and it is crashing when I call the
method

client.query("coreName",queryParms)

The API doc says the string should be a collection. I'm still not sure
about the difference between a collection and a core, so what I am doing is
likely illegal. Given that I have created a core, create a collection from
it so that I can truly pass a collection name to the query function?

Jim A.


Re: Building a web based search engine

2020-06-02 Thread Jim Anderson
am totally missing the 'how to' do what I want. I see a
> lot of
> > > > examples of how to use each of the tools, but not how to put them all
> > > > together. I think an 'overview' at the 10,000 foot level is needed,
> Maybe
> > > > one is available and I have not yet found it. If someone can point
> me to
> > > > one, please do.
> > > >
> > > > If I am correct that an overview on "How To Build A Web Based Search
> > > Engine
> > > > With Solr, Lucene and Nutch" is not available, then I will be
> willing to
> > > > write an overview and make it available to the Solr community.  I
> will
> > > need
> > > > input, explanation and review of others.
> > > >
> > > > My 2 goals are:
> > > >
> > > > 1) Build a demo web based search engine [Note: I have a very specific
> > > > business need to able to demonstrate a web application on top of a
> search
> > > > engine. This demo is intended to show a 'proof of concept' of the web
> > > > application to a small audience.]
> > > >
> > > > 2) Document the process of building the demo and customizing it
> using the
> > > > java API so that others can more easily build their own web base
> search
> > > > engine.
> > > >
> > > > Jim Anderson
> > > >
> > >
> >
>


Re: Building a web based search engine

2020-06-02 Thread Jim Anderson
Hi Markus,

Thanks for your response. I appreciate you giving me the bullet list of
things to do. I can take that list and work from it and hopefully make
progress, but I don't think it will get me where I want to be - just a bit
closer.

You say, "We have been building precisely that for over ten years now". Is
it in a document? I would like to read it.

Some basic things I would like to know that should be documented:

1) Using nutch as the crawler, how do I run a nutch thread that crawls my
named URLs.
2) I will use nutch to visit websites and create documents in solr. How do
I verify that documents have been created in Solr via nutch?
3) Solr will store and index the documents. How do I verify the index?
4) I assume I can run a tomcat server on my host and then provide a
localhost URI to my web browser. Tomcat will then forward the URI to my
application. My application will take a query and using a java API is will
pass the query to Solr. I would like to see an example of a java program
passing a query to Solr.
5) Solr will take the query, parse it and then locate appropriate documents
using the index. Is there a log in Solr showing what queries have been
parsed?
6) Solr will pass back the list of documents it has located. I have not
really looked at this issue yet, but it would be nice to have an example of
this.

Jim



On Tue, Jun 2, 2020 at 12:12 PM Markus Jelsma 
wrote:

> Hello,
>
> We have been building precisely that for over ten years now. The '10,000
> foot level overview' is basically:
>
> * forget about Lucene for now, Solr uses it under the hood;
> * get Solr, and start it with the schema.xml file that comes with Nutch;
> * get Nutch, give it a set of domains or hosts to crawl and some URLs to
> start the crawl with and point the indexer towards the previously
> configured Solr;
> * put a proxy in front of Solr (we use Nginx), or skip this step if it is
> just an internal demo (do not expose Solr to the outside world);
> * make some basic JS tool that handles input and search result responses.
>
> This was our first web search engine prototype and it was set up in a few
> days. The chapter "How To Build A Web Based Search Engine With Solr, Lucene
> and Nutch" just means: set up Solr, and point Nutch towards it, and tell it
> to start crawling and indexing.
>
> Then there comes and endless list of things to improve, autocomplete,
> spell checking, query and click log handling and analysis, proper text
> extraction, etc.
>
> Regards,
> Markus
>
> -Original message-
> > From:Jim Anderson 
> > Sent: Tuesday 2nd June 2020 16:36
> > To: solr-user@lucene.apache.org
> > Subject: Building a web based search engine
> >
> > Hi,
> >
> > I have been looking at solr, lucene and nutch websites and tutuorials for
> > over a week now, experimenting and learning, but also frustrated be the
> > fact the I am totally missing the 'how to' do what I want. I see a lot of
> > examples of how to use each of the tools, but not how to put them all
> > together. I think an 'overview' at the 10,000 foot level is needed, Maybe
> > one is available and I have not yet found it. If someone can point me to
> > one, please do.
> >
> > If I am correct that an overview on "How To Build A Web Based Search
> Engine
> > With Solr, Lucene and Nutch" is not available, then I will be willing to
> > write an overview and make it available to the Solr community.  I will
> need
> > input, explanation and review of others.
> >
> > My 2 goals are:
> >
> > 1) Build a demo web based search engine [Note: I have a very specific
> > business need to able to demonstrate a web application on top of a search
> > engine. This demo is intended to show a 'proof of concept' of the web
> > application to a small audience.]
> >
> > 2) Document the process of building the demo and customizing it using the
> > java API so that others can more easily build their own web base search
> > engine.
> >
> > Jim Anderson
> >
>


Building a web based search engine

2020-06-02 Thread Jim Anderson
Hi,

I have been looking at solr, lucene and nutch websites and tutuorials for
over a week now, experimenting and learning, but also frustrated be the
fact the I am totally missing the 'how to' do what I want. I see a lot of
examples of how to use each of the tools, but not how to put them all
together. I think an 'overview' at the 10,000 foot level is needed, Maybe
one is available and I have not yet found it. If someone can point me to
one, please do.

If I am correct that an overview on "How To Build A Web Based Search Engine
With Solr, Lucene and Nutch" is not available, then I will be willing to
write an overview and make it available to the Solr community.  I will need
input, explanation and review of others.

My 2 goals are:

1) Build a demo web based search engine [Note: I have a very specific
business need to able to demonstrate a web application on top of a search
engine. This demo is intended to show a 'proof of concept' of the web
application to a small audience.]

2) Document the process of building the demo and customizing it using the
java API so that others can more easily build their own web base search
engine.

Jim Anderson