Re: Why use a different analyzer for "index" and "query"?

2020-09-10 Thread Stavros Macrakis
I gave an example of why you might want to analyze the corpus differently
from the query just yesterday -- see
https://lucene.472066.n3.nabble.com/Lowercase-ing-everything-but-acronyms-td4462899.html

  -s

On Thu, Sep 10, 2020 at 11:19 AM Steven White  wrote:

> Hi everyone,
>
> In Solr's schema, I have come across field types that use a different logic
> for "index" than for "query".  To be clear, I"m talking about this block:
>
>  positionIncrementGap="100">
>   
>
>   
>   
>
>   
> 
>
> Why would one want to not use the same logic for both and simply use:
>
>  positionIncrementGap="100">
>   
>
>   
> 
>
> What are real word use cases to use a different analyzer for index and
> query?
>
> Thanks,
>
> Steve
>


Re: Lowercase-ing everything but acronyms

2020-09-09 Thread Stavros Macrakis
I can't help you on the implementation issues, but...

You may want to do something a little different than keep all-uppercase
tokens in upper case. You may want simply to special-case all-uppercase
stopwords, so that they are not ignored. The poster boy for that is IT,
which in my last search application, was *extremely common *and important.
On the corpus side, [it] and [IT] are very distinct. But on the query side,
most users will write [it], so it's fine to have it in the index as [it]
and not [IT]. Similarly for ON (Ontario) and ME (Maine). A nasty one is OR:
if you are using all-uppercase OR for the Boolean operator, how do users
enter OR meaning Operations Research? We know that not many users will
write ["OR"]. So you may simply want to allow lowercase [or] in the query
to match uppercase [OR] in the corpus, and reserve uppercase OR for the
Boolean operator.  Other cases are much rarer (Dijsktra's THE operating
system is of historical interest only...). For non-stopwords, there doesn't
seem to be much of a problem.

  -s

On Wed, Sep 9, 2020 at 2:59 PM Dunham-Wilkie, Mike CITZ:EX <
mike.dunham-wil...@gov.bc.ca> wrote:

> Hi SOLR list,
>
> I'm currently using the White Space tokenizer and the Lower Case filter
> with SOLR 7.3.  I'd like to modify the logic to keep any tokens that are
> entirely upper case as upper case, and just apply the Lower Case filter (or
> something equivalent) to the remaining tokens.  Is there a way to do this
> using tokenizers and filters?
>
> Thanks
> Mike
>
>
> Mike Dunham-Wilkie | Senior Spatial Data Administration Analyst | PHONE...
> 778-676-1791
> Data Systems & Services - Digital Platforms and Data Division - Ministry
> of Citizens' Services
>
> For faster response and/or future inquires, the following email addresses
> are monitored continuously:
> BC Geographic Warehouse (BCGW) and Replication/ETL | DataBC Data
> Architecture Services (databc...@gov.bc.ca)
> BC Data Catalogue (BCDC) and Open Data | DataBC Catalogue Services (
> data...@gov.bc.ca)
>
>


Search for term except within phrase

2020-07-06 Thread Stavros Macrakis
(Sorry for sending this with the wrong subject earlier.)

How can I search for a term except when it's part of certain phrases?

For example, I might want to find documents mentioning "pepper" where it is
not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".

It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
OR "pepper sauce")] because that excludes all documents which mention
"chili pepper" even if they also mention "black pepper" or the unmodified
word "pepper". Maybe some way using synonyms?

Thanks!

 -s


Re: Tokenizing managed synonyms

2020-07-06 Thread Stavros Macrakis
How can I search for a term *except *when it's part of certain phrases?

For example, I might want to find documents mentioning "pepper" where it is
not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".

It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
OR "pepper sauce")] because that excludes all documents which mention
"chili pepper" even if they *also* mention "black pepper" or the unmodified
word "pepper". Maybe some way using synonyms?

Thanks!

 -s

On Mon, Jul 6, 2020 at 6:43 PM Thomas Corthals 
wrote:

> Hi,
>
> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
> some fields.
>
> Best,
>
> Thomas
>


Re: Trouble starting Solr on Windows/Ubuntu

2020-05-23 Thread Stavros Macrakis
Jan,

Thanks for your suggestion!

Unfortunately, that doesn't fix the problem in Ubuntu under Windows 10.

Fortunately, I figured out how to start Solr on Windows 10 itself. It turns
out that solr.cmd depends on the Windows functions 'find' and 'timeout',
which were being shadowed by the Cygwin (Gnu) utilities of the same names.
It also turned out that I had an old 32-bit JRE in my Windows config. I
deleted that and updated the 64-bit JRE.

Thanks again,

 -s

On Sat, May 23, 2020 at 5:27 AM Jan Høydahl  wrote:

> You have a Core Dump which tells that the java process crash big time.
> Could be a permission issue between your windows file system and the WSL
> file system? Try do a chmod -R 777 solr-8.5.1 and then try again?
>
> Jan Høydahl
>
> > 22. mai 2020 kl. 23:32 skrev Stavros Macrakis :
> >
> > I'm trying to follow the Solr Tutorial (
> >
> https://lucene.apache.org/solr/guide/8_5/solr-tutorial.html#solr-tutorial
> ).
> >
> > Yesterday, "bin/solr start" worked fine -- I could see the status page on
> > http://localhost:8993 . I even created a test config server/solr/test1
> > through the Web interface.
> >
> > Today, I'm getting an error message when I try to start Solr. This is
> from
> > an Ubuntu top-level shell (I previously tried a shell buffer within Emacs
> > under Ubuntu, which failed). I've rebooted Windows, and it still fails.
> See
> > transcript and version info below.
> >
> > What am I doing wrong? -- and is solr-user the right place to ask newbie
> > questions like this?
> > (None of the env variables mentioned in the error message are defined.)
> >
> > transcript
> > xxx:/mnt/c/solr-8.5.1$ bin/solr status -help
> >
> > No Solr nodes are running.
> >
> > xxx:/mnt/c/solr-8.5.1$ bin/solr start
> > Waiting up to 180 seconds to see Solr running on port 8983 [|]  bin/solr:
> > line 669:  8456 Aborted (core dumped) nohup "$JAVA"
> > "${SOLR_START_OPTS[@]}" $SOLR_ADDL_ARGS -Dsolr.log.muteconsole
> > "-XX:OnOutOfMemoryError=$SOLR_TIP/bin/oom_solr.sh $SOLR_PORT
> > $SOLR_LOGS_DIR" -jar start.jar "${SOLR_JETTY_CONFIG[@]}"
> > $SOLR_JETTY_ADDL_CONFIG > "$SOLR_LOGS_DIR/solr-$SOLR_PORT-console.log"
> 2>&1
> > [/]  Still not seeing Solr listening on 8983 after 180 seconds!
> > tail: cannot open '/mnt/c/solr-8.5.1/server/logs/solr.log' for reading:
> No
> > such file or directory
> >
> > xxx:/mnt/c/solr-8.5.1$ echo foo > /mnt/c/solr-8.5.1/server/logs/solr.log
> > xxx:/mnt/c/solr-8.5.1$ cat /mnt/c/solr-8.5.1/server/logs/solr.log
> > foo   <<< log file is writeable
> >
> > versions 
> >
> > xxx:/mnt/c/solr-8.5.1$ uname -a
> > Linux DESKTOP-M6LDB7Q 4.4.0-18362-Microsoft #836-Microsoft Mon May 05
> > 16:04:00 PST 2020 x86_64 x86_64 x86_64 GNU/Linux
> > xxx:/mnt/c/solr-8.5.1$ which java
> > /usr/bin/java
> > xxx:/mnt/c/solr-8.5.1$ java -version
> > openjdk version "11.0.7" 2020-04-14
> > OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-2ubuntu218.04)
> > OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-2ubuntu218.04,
> mixed
> > mode, sharing)
> > xxx:/mnt/c/solr-8.5.1$ bin/solr -version
> > 8.5.1
>


Trouble starting Solr on Windows/Ubuntu

2020-05-22 Thread Stavros Macrakis
I'm trying to follow the Solr Tutorial (
https://lucene.apache.org/solr/guide/8_5/solr-tutorial.html#solr-tutorial).

Yesterday, "bin/solr start" worked fine -- I could see the status page on
http://localhost:8993 . I even created a test config server/solr/test1
through the Web interface.

Today, I'm getting an error message when I try to start Solr. This is from
an Ubuntu top-level shell (I previously tried a shell buffer within Emacs
under Ubuntu, which failed). I've rebooted Windows, and it still fails. See
transcript and version info below.

What am I doing wrong? -- and is solr-user the right place to ask newbie
questions like this?
(None of the env variables mentioned in the error message are defined.)

transcript
xxx:/mnt/c/solr-8.5.1$ bin/solr status -help

No Solr nodes are running.

xxx:/mnt/c/solr-8.5.1$ bin/solr start
Waiting up to 180 seconds to see Solr running on port 8983 [|]  bin/solr:
line 669:  8456 Aborted (core dumped) nohup "$JAVA"
"${SOLR_START_OPTS[@]}" $SOLR_ADDL_ARGS -Dsolr.log.muteconsole
"-XX:OnOutOfMemoryError=$SOLR_TIP/bin/oom_solr.sh $SOLR_PORT
$SOLR_LOGS_DIR" -jar start.jar "${SOLR_JETTY_CONFIG[@]}"
$SOLR_JETTY_ADDL_CONFIG > "$SOLR_LOGS_DIR/solr-$SOLR_PORT-console.log" 2>&1
 [/]  Still not seeing Solr listening on 8983 after 180 seconds!
tail: cannot open '/mnt/c/solr-8.5.1/server/logs/solr.log' for reading: No
such file or directory

xxx:/mnt/c/solr-8.5.1$ echo foo > /mnt/c/solr-8.5.1/server/logs/solr.log
xxx:/mnt/c/solr-8.5.1$ cat /mnt/c/solr-8.5.1/server/logs/solr.log
foo   <<< log file is writeable

versions 

xxx:/mnt/c/solr-8.5.1$ uname -a
Linux DESKTOP-M6LDB7Q 4.4.0-18362-Microsoft #836-Microsoft Mon May 05
16:04:00 PST 2020 x86_64 x86_64 x86_64 GNU/Linux
xxx:/mnt/c/solr-8.5.1$ which java
/usr/bin/java
xxx:/mnt/c/solr-8.5.1$ java -version
openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-2ubuntu218.04)
OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-2ubuntu218.04, mixed
mode, sharing)
xxx:/mnt/c/solr-8.5.1$ bin/solr -version
8.5.1