Re: Investigating Seeming Deadlock

2021-03-05 Thread Mike Drob
Were you having any OOM errors beforehand? If so, that could have caused
some GC of objects that other threads still expect to be reachable, leading
to these null monitors.

On Fri, Mar 5, 2021 at 12:55 PM Stephen Lewis Bianamara <
stephen.bianam...@gmail.com> wrote:

> Hi SOLR Community,
>
> I'm investigating a node on solr 8.3.1 running in cloud mode which appears
> to have deadlocked, and I'm trying to figure out if this is a known issue
> or not, and looking for some guidance in understanding both (a) whether
> this is a resolved issue in future releases or needs a bug, and (b) how to
> lower the risk of recurrence until it is fixed.
>
> Here is what I've observed:
>
>- strace shows the main process waiting. A spot check on child processes
>shows the same, though I did not deep dive all of the threads yet (there
>are over 100).
>- the server was not doing anything or busy, except for jvm sitting at
>constant memory usage. No resource of memory, swap, cpu, etc... was
> limited
>or showing active usage.
>- jcmd Thread.Print shows some interesting info which suggests a
>deadlock or another type of locking issue
>   - For example, I found this log suggests something unusual because it
>   looks like it's trying to lock a null object
>  - "Finalizer" #3 daemon prio=8 os_prio=0 cpu=11.11ms
>  elapsed=11.11s tid=0x0100 nid=0x in
> Object.wait()
>   [0x1000]
> java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(java.base@11.0.7/Native Method)
>  - waiting on 
>  at java.lang.ref.ReferenceQueue.remove(java.base@11.0.7
>  /ReferenceQueue.java:155)
>  - waiting to re-lock in wait() <0x00020020> (a
>  java.lang.ref.ReferenceQueue$Lock)
>  at java.lang.ref.ReferenceQueue.remove(java.base@11.0.7
>  /ReferenceQueue.java:176)
>  at
>  java.lang.ref.Finalizer$FinalizerThread.run(java.base@11.0.7
>  /Finalizer.java:170)
>  - I also see a lot of this. Some addressess occur multiple times,
>   but one in particular occurs 31 times. Maybe related?
>  - "h2sc-1-thread-11" #110 prio=5 os_prio=0 cpu=54.29ms
>  elapsed=11.11s tid=0x10010100 nid=0x waiting
> on condition
>   [0x10011000]
> java.lang.Thread.State: WAITING (parking)
>  at jdk.internal.misc.Unsafe.park(java.base@11.0.7/Native
>  Method)
>  - parking to wait for  <0x00030033>
>
> Can anyone help answer whether this is known or what I could look at next?
>
> Thanks!
> Stephen
>


Re: Partial update bug on solr 8.8.0

2021-03-02 Thread Mike Drob
This looks like a bug that is already fixed but not yet released in 8.9

https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-13034

On Tue, Mar 2, 2021 at 6:27 AM Mohsen Saboorian  wrote:

> Any idea about this post?
> https://stackoverflow.com/q/66335803/141438
>
> Regards.
>


Re: Asymmetric Key Size not sufficient

2021-02-14 Thread Mike Drob
Future vulnerability reports should be sent to secur...@apache.org so that
they can be resolved privately.

Thank you

On Fri, Feb 12, 2021 at 10:17 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> Recent versions of Solr use 2048.
>
> https://github.com/apache/lucene-solr/blob/branch_8_6/solr/core/src/java/org/apache/solr/util/CryptoKeys.java#L332
>
> Thanks for your report.
>
> On Fri, Feb 12, 2021 at 3:44 PM Mahir Kabir  wrote:
>
> > Hello,
> >
> > I am a Ph.D. student at Virginia Tech, USA. While working on a security
> > project-related work, we came across the following vulnerability in the
> > source code -
> >
> > In file
> >
> >
> https://github.com/apache/lucene-solr/blob/branch_6_6/solr/core/src/java/org/apache/solr/util/CryptoKeys.java
> > <
> >
> https://github.com/apache/ranger/blob/71e1dd40366c8eb8e9c498b0b5158d85d603af02/kms/src/main/java/org/apache/hadoop/crypto/key/RangerKeyStore.java
> > >
> > (at
> > Line 300) Key Size was set as 1024.
> >
> > *Security Impact*:
> >
> > < 2048 key size for RSA algorithm makes the system vulnerable to
> > brute-force attack
> >
> > *Useful resource*:
> > https://rules.sonarsource.com/java/type/Vulnerability/RSPEC-4426
> > https://rules.sonarsource.com/java/type/Vulnerability/RSPEC-4426
> >
> > *Solution we suggest*:
> >
> > For RSA algorithm, the key size should be >= 2048
> >
> > *Please share with us your opinions/comments if there is any*:
> >
> > Is the bug report helpful?
> >
> > Please let us know what you think about the issue. Any feedback will be
> > appreciated.
> >
> > Thank you,
> > Md Mahir Asef Kabir
> > Ph.D. Student
> > Department of CS
> > Virginia Tech
> >
>


Re: Ghost Documents or Shards out of Sync

2021-02-01 Thread Mike Drob
To expand on what Jason suggested, if the issue is the non-deterministic
ordering due to staggered commits per replica, you may have more
consistency with TLOG replicas rather than the NRT replicas. In this case,
the underlying segment files should be identical and lead to more
predictable results.

On Mon, Feb 1, 2021 at 2:50 PM Jason Gerlowski 
wrote:

> Hi Ronen,
>
> The first thing I'd figure out in your situation is whether the
> results are actually different each time, or whether the ordering is
> what differs (which might push a particular result off the page you're
> looking at, giving the appearance that it didn't match).
>
> In the case of the former, this can happen briefly if queries come in
> when some but not all replicas have seen a commit.  But usually this
> is a transient concern - either waiting for the next autocommit or
> triggering an explicit commit resolves the discrepancy in this case.
> Since you only see identical results after a restart, this _doesn't_
> sound like what you're seeing.
>
> In the case of the latter (same results, differently ordered) this is
> expected sometimes.  Solr sorts on relevance by default with the
> internal Lucene document ID being a tiebreaker.  Both the relevance
> statistics and Lucene's document IDs can differ across SolrCloud
> replicas (due to non-deterministic conditions such as the segment
> merging and deleted-doc removal that Lucene does under the hood), and
> this can produce differently-ordered result sets for users that issue
> the same query repeatedly.
>
> Good luck narrowing things down!
>
> Jason
>
> On Mon, Jan 25, 2021 at 3:32 AM Ronen Nussbaum  wrote:
> >
> > Hi All,
> >
> > I'm using Solr Cloud (version 8.3.0) with shards and replicas
> (replication
> > factor of 2).
> > Recently, I've encountered several times that running the same query
> > repeatedly yields different results. Restarting the nodes fixes the
> problem
> > (until next time).
> > I assume that some shards are not synchronized and I have several
> questions:
> > 1. What can cause this - many atomic updates? issues with commits?
> > 2. Can I trigger the "fixing" mechanism that Solr runs at restart by an
> API
> > call or some other method?
> >
> > Thanks in advance,
> > Ronen.
>


Re: Apache Solr Reference Guide isn't accessible

2021-02-01 Thread Mike Drob
Hi Dorion,

We are currently working with our infra team to get these restored. In the
meantime, the 8.4 guide is still available at
https://lucene.apache.org/solr/guide/8_4/ and are hopeful that the 8.8
guide will be back up soon. Thank you for your patience.

Mike

On Mon, Feb 1, 2021 at 1:58 PM Dorion Caroline 
wrote:

> Hi,
>
> I can't access to Apache Solr Reference Guide since few days.
> Example:
> URL
>
>   *   https://lucene.apache.org/solr/guide/8_8/
>   *   https://lucene.apache.org/solr/guide/8_7/
> Result:
> Not Found
> The requested URL was not found on this server.
>
> Do you know what going on?
>
> Thanks
> Caroline Dorion
>


Re: Solr 8.7.0 memory leak?

2021-01-27 Thread Mike Drob
Are you running these in docker containers?

Also, I’m assuming this is a typo but just in case the setting is Xmx :)

Can you share the OOM stack trace? It’s not always running out of memory,
sometimes Java throws OOM for file handles or threads.

Mike

On Wed, Jan 27, 2021 at 10:00 PM Luke  wrote:

> Shawn,
>
> it's killed by OOME exception. The problem is that I just created empty
> collections and the Solr JVM keeps growing and never goes down. there is no
> data at all. at the beginning, I set Xxm=6G, then 10G, now 15G, Solr 8.7
> always use all of them and it will be killed by oom.sh once jvm usage
> reachs 100%.
>
> I have another solr 8.6.2 cloud(3 nodes) in separated environment , which
> have over 100 collections, the Xxm = 6G , jvm is always 4-5G.
>
>
>
> On Thu, Jan 28, 2021 at 2:56 AM Shawn Heisey  wrote:
>
> > On 1/27/2021 5:08 PM, Luke Oak wrote:
> > > I just created a few collections and no data, memory keeps growing but
> > never go down, until I got OOM and solr is killed
> > >
> > > Any reason?
> >
> > Was Solr killed by the operating system's oom killer or did the death
> > start with a Java OutOfMemoryError exception?
> >
> > If it was the OS, then the entire system doesn't have enough memory for
> > the demands that are made on it.  The problem might be Solr, or it might
> > be something else.  You will need to either reduce the amount of memory
> > used or increase the memory in the system.
> >
> > If it was a Java OOME exception that led to Solr being killed, then some
> > resource (could be heap memory, but isn't always) will be too small and
> > will need to be increased.  To figure out what resource, you need to see
> > the exception text.  Such exceptions are not always recorded -- it may
> > occur in a section of code that has no logging.
> >
> > Thanks,
> > Shawn
> >
>


Re: NullPointerException in Graph Traversal nodes streaming expression

2021-01-21 Thread Mike Drob
Can you provide a sample expression that would be able to reproduce this?
Are you able to try a newer version by chance - I know we've fixed a few
NPEs recently, maybe https://issues.apache.org/jira/browse/SOLR-14700

On Thu, Jan 21, 2021 at 4:13 PM ufuk yılmaz 
wrote:

> Solr version 8.4. I’m getting an unexplanetory NullPointerException when
> executing a simple 2 level nodes stream, do you have any idea what may
> cause this?
>
> I tried setting /stream?partialResults=true=true and
> shards.tolerant=true in nodes expressions, with no luck. I also tried
> reading source of GatherNodesStream in branch 8_4, but couldn’t understand
> it. Here is a beautiful stack trace:
>
> solr| 2021-01-21 22:00:12.726 ERROR (qtp832292933-25149)
> [c:WorkerCollection s:shard1 r:core_node10
> x:WorkerCollection_shard1_replica_n9] o.a.s.c.s.i.s.ExceptionStream
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.lang.RuntimeException: java.lang.NullPointerException
> solr|   at
> org.apache.solr.client.solrj.io.graph.GatherNodesStream.read(GatherNodesStream.java:607)
> solr|   at
> org.apache.solr.client.solrj.io.stream.ExceptionStream.read(ExceptionStream.java:71)
> solr|   at
> org.apache.solr.handler.StreamHandler$TimerStream.read(StreamHandler.java:454)
> solr|   at
> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$writeMap$0(TupleStream.java:84)
> solr|   at
> org.apache.solr.common.util.JsonTextWriter.writeIterator(JsonTextWriter.java:141)
> solr|   at
> org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:67)
> solr|   at
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:152)
> solr|   at
> org.apache.solr.common.util.JsonTextWriter$2.put(JsonTextWriter.java:176)
> solr|   at
> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(TupleStream.java:81)
> solr|   at
> org.apache.solr.common.util.JsonTextWriter.writeMap(JsonTextWriter.java:164)
> solr|   at
> org.apache.solr.common.util.TextWriter.writeVal(TextWriter.java:69)
> solr|   at
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:152)
> solr|   at
> org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386)
> solr|   at
> org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292)
> solr|   at
> org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73)
> solr|   at
> org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:66)
> solr|   at
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
> solr|   at
> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892)
> solr|   at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:594)
> solr|   at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
> solr|   at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
> solr|   at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
> solr|   at
> org.eclipse.jetty.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:311)
> solr|   at
> org.eclipse.jetty.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:265)
> solr|   at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> solr|   at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> solr|   at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> solr|   at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> solr|   at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> solr|   at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> solr|   at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
> solr|   at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> solr|   at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
> solr|   at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> solr|   at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
> solr|   at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
> solr|   at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
> solr|   at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
> solr|   at
> 

Re: Cursor Performance Issue

2021-01-13 Thread Mike Drob
You should be using docvalues on your id, but note that switching this
would require a reindex.

On Wed, Jan 13, 2021 at 6:04 AM Ajay Sharma 
wrote:

> Hi All,
>
> I have used cursors to search and export documents in solr according to
>
> https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
>
> Solr version: 6.5.0
> No of Documents: 10 crore
>
> Before implementing cursor, I was using the start and rows parameter to
> fetch records
> Service response time used to be 2 sec
>
> *Before implementing Cursor Solr URL:*
> http://localhost:8080/solr/search/select?q=bird
> toy=mapping=3=25=100
>
> Request handler Looks like this: fl contains approx 20 fields
> 
> 
> edismax
> on
> 0.01
> 
> 
> id,refid,title,smalldesc:""
> 
>
> none
> json
> 25
> 15000
> smalldesc
> title_text
> titlews^3
> sdescnisq
> 1
> 
> 2-1 470%
> 
> 
>
> Sharing Response with EchoParams=all > Qtime is 6
> responseHeader: {
> status: 0,
> QTime: 6,
> params: {
> ps: "3",
> echoParams: "all",
> indent: "on",
> fl: "id,refid,title,smalldesc:"",
> tie: "0.01",
> defType: "edismax",
> qf: "customphonetic",
> wt: "json",
>qs: "1",
>qt: "mapping",
>rows: "25",
>q: "bird toy",
>timeAllowed: "15000"
> }
> },
> response: {
> numFound: 17,
> start: 0,
> maxScore: 26.616478,
> docs: [
>   {
> id: "22347708097",
> refid: "152585558",
> title: "Round BIRD COLOURFUL SWINGING CIRCULAR SITTING TOY",
> smalldesc: "",
> score: 26.616478
>  }
> ]
> }
>
> I am facing a performance issue now after implementing the cursor. Service
> response time is increased 3 to 4 times .i.e. 8 sec in some cases
>
> *After implementing Cursor query is-*
> localhost:8080/solr/search/select?q=bird
> toy=cursor=3=1000=100=score desc,id asc=*
>
> Just added =score desc,id asc=* to the before query and
> rows to be fetched is 1000 now and fl contains just a single field
>
> Request handler remains same as before just changed the name and made fl
> change and added df in defaults
>
> 
>
>   edismax
>   on
>   0.01
>
>
>   refid
>
>
>   none
>   json
>   1000
>   smalldesc
>   title_text
>   titlews^3
>   sdescnisq
>   1
>   2-1 470%
>   product_titles
>
> 
>
> Response with Cursor and echoParams=all-> *Qtime is now 17* i.e approx 3
> time of previous qtime
> responseHeader: {
> status: 0,
> QTime: 17,
> params: {
> df: "product_titles",
> ps: "3",
> echoParams: "all",
> indent: "on",
> fl: "refid",
> tie: "0.01",
> defType: "edismax",
> qf: "customphonetic",
> qs: "1",
> qt: "cursor",
> sort: "score desc,id asc",
> rows: "1000",
> q: "bird toy",
> cursorMark: "*",
> }
> },
> response: {
> numFound: 17,
> start: 0,
> docs: [
> {
> refid: "152585558"
> },
> {
> refid: "157276077"
> }
> ]
> }
>
>
> When i curl http://localhost:8080/solr/search/select?q=bird
> toy=mapping=3=25=100, i can get results in 3 seconds.
> When i curl localhost:8080/solr/search/select?q=bird
> toy=cursor=3=1000=100=score desc,id asc=* it
> consumed 8 seconds to return result even if the result count=0
>
> BTW, the id schema definition is used in sort
>  omitNorms="true" multiValued="false"/>
>
> Is it due to the sort I have applied or I have implemented it in the wrong
> way?
> Please help or provide the direction to solve this issue
>
>
> Thanks in advance
>
> --
> Thanks & Regards,
> Ajay Sharma
> Product Search
> Indiamart Intermesh Ltd.
>
> --
>
>


Re: Converting a collection name to an alias

2021-01-07 Thread Mike Drob
I believe you may be able to use that command (or some combination of
create alias commands) to create an alias from A to A, and then in
the future when you want to change it you can have Alias A to collection B
(assuming this is the point of the alias in the first place).

On Thu, Jan 7, 2021 at 1:53 PM ufuk yılmaz 
wrote:

> Hi,
> I’m aware of that API but it doesn’t do what I actually want.
>
> regards
>
> Sent from Mail for Windows 10
>
> From: matthew sporleder
> Sent: 07 January 2021 22:46
> To: solr-user@lucene.apache.org
> Subject: Re: Converting a collection name to an alias
>
> https://lucene.apache.org/solr/guide/8_1/collections-api.html#rename
>
> On Thu, Jan 7, 2021 at 2:07 PM ufuk yılmaz 
> wrote:
> >
> > Hi again,
> >
> > Lets say I have a collection named A.
> > I’m trying to rename it to A_1, then create an alias named A, which
> points to the A_1 collection.
> > Is this possible without deleting and reindexing the collection from
> scratch?
> >
> > Regards,
> > uyilmaz
> >
>
>


Re: SPLITSHARD - data loss of child documents

2020-12-17 Thread Mike Drob
I was under the impression that split shard doesn’t work with child
documents, if that is missing from the ref guide we should update it

On Thu, Dec 17, 2020 at 4:30 AM Nussbaum, Ronen 
wrote:

> Hi Everyone,
>
> We're using version 8.6.1 with nested documents.
> I used the SPLITSHARD API and after it finished successfully, I've noticed
> the following:
>
>   1.  Most of child documents are missing - before the split: ~600M,
> after: 68M
>   2.  Retrieving a document with its children, shows child documents that
> do not belong to this parent (their parentID value is different than
> parent's ID).
>
> I didn't see any limitation in the API documentation.
> Do you have any suggestions?
>
> Thanks in advance,
> Ronen.
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


Re: solr 8.6.3 and noggit

2020-11-20 Thread Mike Drob
Noggit code was forked into Solr, see SOLR-13427
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.3/solr/solrj/src/java/org/noggit/ObjectBuilder.java

It looks like that particular method was added in 8.4 via
https://issues.apache.org/jira/browse/SOLR-13824

Is it possible you're using an older SolrJ against a newer Solr server (or
vice versa).

Mike

On Fri, Nov 20, 2020 at 2:25 PM Susmit Shukla 
wrote:

> Hi,
> got this error using streaming with solrj 8.6.3 . does it use noggit-0.8.
> It was not mentioned in dependencies
> https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/ivy.xml
>
> Caused by: java.lang.NoSuchMethodError: 'java.lang.Object
> org.noggit.ObjectBuilder.getValStrict()'
>
> at org.apache.solr.common.util.Utils.fromJSON(Utils.java:284)
> ~[solr-solrj-8.6.3.jar:8.6.3 e001c2221812a0ba9e9378855040ce72f93eced4 -
> jasongerlowski - 2020-10-03 18:12:06]
>


Re: download binary files will not uncompress

2020-11-03 Thread Mike Drob
Routing back to the mailing list, please do not reply directly to
individual emails.

You did not download the complete file, the releases should be
approximately 180MB, not the 30KB that you show.

Try downloading from a different mirror, or check if you are behind a proxy
or firewall preventing the downloads.


On Tue, Nov 3, 2020 at 4:51 PM James Rome  wrote:

> jar@jarfx ~/.gnupg $ gpg --import ~/download/KEYS
> gpg: key B83EA82A0AFCEE7C: public key "Yonik Seeley "
> imported
> gpg: key E48025ED13E57FFC: public key "Upayavira "
> imported
>
> ...
>
> jar@jarfx ~/download $ ls -l solr*
> -rw-r--r-- 1 root root 30690 Nov  3 17:00 solr-8.6.3.tgz
> -rw-r--r-- 1 root root   833 Oct  3 21:44 solr-8.6.3.tgz.asc
> -rw-r--r-- 1 root root   145 Oct  3 21:44 solr-8.6.3.tgz.sha512
> -rw-r--r-- 1 root root 30718 Nov  3 17:01 solr-8.6.3.zip
>
> gpg --verify  solr-8.6.3.tgz.asc solr-8.6.3.tgz
> gpg: Signature made Sat 03 Oct 2020 06:17:01 PM EDT
> gpg:using RSA key 902CC51935C140BF820230961FD5295281436075
> gpg: BAD signature from "Jason Gerlowski (CODE SIGNING KEY)
> " [unknown]
>
> jar@jarfx ~/download $ tar xvf solr-8.6.3.tgz
>
> gzip: stdin: not in gzip format
> tar: Child returned status 1
> tar: Error is not recoverable: exiting now
>
>
> James A. Rome
> 116 Claymore Lane
> Oak Ridge, TN 37830-7674
> 865 482-5643; Cell: 865 566-7991
> jamesr...@gmail.com
> https://jamesrome.net
>
> On 11/3/20 5:20 PM, Mike Drob wrote:
> > Can you check the signatures to make sure your downloads were not
> > corrupted? I just checked and was able to download and uncompress both of
> > them.
> >
> > Also, depending on your version of tar, you don't want the - for your
> > flags... tar xf solr-8.6.3.tgz
> >
> > Mike
> >
> > On Tue, Nov 3, 2020 at 4:15 PM James Rome  wrote:
> >
> >> # Source release: solr-8.6.3-src.tgz
> >> <
> >>
> https://www.apache.org/dyn/closer.lua/lucene/solr/8.6.3/solr-8.6.3-src.tgz
> >
> >>
> >> [PGP
> >> <https://downloads.apache.org/lucene/solr/8.6.3/solr-8.6.3-src.tgz.asc
> >]
> >> [SHA512
> >> <
> https://downloads.apache.org/lucene/solr/8.6.3/solr-8.6.3-src.tgz.sha512
> >>> ]
> >> # Binary releases: solr-8.6.3.tgz
> >> <https://www.apache.org/dyn/closer.lua/lucene/solr/8.6.3/solr-8.6.3.tgz
> >
> >> [PGP
> >> <https://downloads.apache.org/lucene/solr/8.6.3/solr-8.6.3.tgz.asc>]
> >> [SHA512
> >> <https://downloads.apache.org/lucene/solr/8.6.3/solr-8.6.3.tgz.sha512>]
> >> / solr-8.6.3.zip
> >> <https://www.apache.org/dyn/closer.lua/lucene/solr/8.6.3/solr-8.6.3.zip
> >
> >> [PGP
> >> <https://downloads.apache.org/lucene/solr/8.6.3/solr-8.6.3.zip.asc>]
> >> [SHA512
> >> <https://downloads.apache.org/lucene/solr/8.6.3/solr-8.6.3.zip.sha512>]
> >>
> >>unzip solr-8.6.3.zip
> >> Archive:  solr-8.6.3.zip
> >> End-of-central-directory signature not found.  Either this file is
> not
> >> a zipfile, or it constitutes one disk of a multi-part archive. In
> the
> >> latter case the central directory and zipfile comment will be found
> on
> >> the last disk(s) of this archive.
> >>
> >>
> >> and
> >>
> >> # tar -xvf solr-8.6.3.tgz
> >>
> >> gzip: stdin: not in gzip format
> >> tar: Child returned status 1
> >> tar: Error is not recoverable: exiting now
> >>
> >> --
> >> James A. Rome
> >>
> >> https://jamesrome.net
> >>
> >>
>


Re: download binary files will not uncompress

2020-11-03 Thread Mike Drob
Can you check the signatures to make sure your downloads were not
corrupted? I just checked and was able to download and uncompress both of
them.

Also, depending on your version of tar, you don't want the - for your
flags... tar xf solr-8.6.3.tgz

Mike

On Tue, Nov 3, 2020 at 4:15 PM James Rome  wrote:

> # Source release: solr-8.6.3-src.tgz
> <
> https://www.apache.org/dyn/closer.lua/lucene/solr/8.6.3/solr-8.6.3-src.tgz>
>
> [PGP
> ]
> [SHA512
>  >]
> # Binary releases: solr-8.6.3.tgz
> 
> [PGP
> ]
> [SHA512
> ]
> / solr-8.6.3.zip
> 
> [PGP
> ]
> [SHA512
> ]
>
>   unzip solr-8.6.3.zip
> Archive:  solr-8.6.3.zip
>End-of-central-directory signature not found.  Either this file is not
>a zipfile, or it constitutes one disk of a multi-part archive. In the
>latter case the central directory and zipfile comment will be found on
>the last disk(s) of this archive.
>
>
> and
>
> # tar -xvf solr-8.6.3.tgz
>
> gzip: stdin: not in gzip format
> tar: Child returned status 1
> tar: Error is not recoverable: exiting now
>
> --
> James A. Rome
>
> https://jamesrome.net
>
>


Re: Solr dependency update at Apache Beam - which versions should be supported

2020-10-27 Thread Mike Drob
Piotr,

Based on the questions that we've seen over the past month on this list,
there are still users with Solr on 6, 7, and 8. I suspect there are still
Solr 5 users out there too, although they don't appear to be asking for
help - likely they are in set it and forget it mode.

Solr 7 may not be officially deprecated on our site, but it's pretty old at
this point and we're not doing any development on it outside of mybe a
very high profile security fix. Even then, we might acknowledge it and
recommend users update to 8.x anyway.

The index files generated by Lucene and consumed by Solr are backwards
compatible up to one major version. Some of the API remains compatible, a
client issuing simple queries to Solr 5 would probably work fine even
against Solr 9 when it comes out eventually. A client doing admin
operations will be less certain. I don't know enough about Beam to tell you
where on the spectrum your use will fall.

I'm not sure if this was helpful or not, but maybe it is a nudge in the
right direction.

Good luck,
Mike


On Tue, Oct 27, 2020 at 11:09 AM Piotr Szuberski <
piotr.szuber...@polidea.com> wrote:

> Hi,
>
> We are working on dependency updates at Apache Beam and I would like to
> consult which versions should be supported so we don't break any existing
> users.
>
> Previously the supported Solr version was 5.5.4.
>
> Versions 8.x.y and 7.x.y naturally come to mind as they are the only not
> deprecated. But maybe there are users that use some earlier versions?
>
> Are these versions backwards-compatible or there are things to be aware of?
>
> Regards
>


Re: Folding Repeated Letters

2020-10-08 Thread Mike Drob
I was thinking about that, but there are words that are legitimately
different with repeated consonants. My primary school teacher lost hair
over getting us to learn the difference between desert and dessert.

Maybe we need something that can borrow the boosting behaviour of fuzzy
query - match the exact term, but also the neighbors with a slight deboost,
so that if the main term exists those others won't show up.

On Thu, Oct 8, 2020 at 5:46 PM Andy Webb  wrote:

> How about something like this?
>
> {
> "add-field-type": [
> {
> "name": "norepeat",
> "class": "solr.TextField",
> "analyzer": {
> "tokenizer": {
> "class": "solr.StandardTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> },
> {
> "class": "solr.PatternReplaceFilterFactory",
> "pattern": "(.)\\1+",
> "replacement": "$1"
> }
> ]
> }
> }
> ]
> }
>
> This finds a match...
>
> http://localhost:8983/solr/#/norepeat/analysis?analysis.fieldvalue=Yes=YyyeeEssSs=norepeat
>
> Andy
>
>
>
> On Thu, 8 Oct 2020 at 23:02, Mike Drob  wrote:
>
> > I'm looking for a way to transform words with repeated letters into the
> > same token - does something like this exist out of the box? Do our
> stemmers
> > support it?
> >
> > For example, say I would want all of these terms to return the same
> search
> > results:
> >
> > YES
> > YESSS
> > YYYEEESSS
> > YYEE[...]S
> >
> > I don't know how long a user would hold down the S key at the end to
> > capture their level of excitement, and I don't want to manually define
> > synonyms for every length.
> >
> > I'm pretty sure that I don't want PhoneticFilter here, maybe
> > PatternReplace? Not a huge fan of how that one is configured, and I think
> > I'd have to set up a bunch of patterns inline for it?
> >
> > Mike
> >
>


Folding Repeated Letters

2020-10-08 Thread Mike Drob
I'm looking for a way to transform words with repeated letters into the
same token - does something like this exist out of the box? Do our stemmers
support it?

For example, say I would want all of these terms to return the same search
results:

YES
YESSS
YYYEEESSS
YYEE[...]S

I don't know how long a user would hold down the S key at the end to
capture their level of excitement, and I don't want to manually define
synonyms for every length.

I'm pretty sure that I don't want PhoneticFilter here, maybe
PatternReplace? Not a huge fan of how that one is configured, and I think
I'd have to set up a bunch of patterns inline for it?

Mike


Re: Term too complex for spellcheck.q param

2020-10-07 Thread Mike Drob
Right now the only solution is to use a shorter term.

In a fuzzy query you could also try using a lower edit distance e.g. term~1
(default is 2), but I’m not sure what the syntax for a spellcheck would be.

Mike

On Wed, Oct 7, 2020 at 2:59 PM gnandre  wrote:

> Hi,
>
> I am getting following error when I pass '
> 김포오피➬유유닷컴➬✗UUDAT3.COM유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마
> ' in spellcheck.q param. How to avoid this error? I am using Solr 8.5.2
>
> {
>   "error": {
> "code": 500,
> "msg": "Term too complex: 김포오피➬유유닷컴➬✗uudat3.com
> 유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마",
> "trace": "org.apache.lucene.search.FuzzyTermsEnum$FuzzyTermsException:
> Term too complex:
> 김포오피➬유유닷컴➬✗uudat3.com유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마\n\tat
>
> org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)\n\tat
>
> org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:365)\n\tat
>
> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:125)\n\tat
>
> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:92)\n\tat
>
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:425)\n\tat
>
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:376)\n\tat
>
> org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:196)\n\tat
>
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)\n\tat
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)\n\tat
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
>
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat
>
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat
> org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat
>
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)\n\tat
> org.eclipse.jetty.io
> .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat
> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat
>
> 

Re: Adding solr-core via maven fails

2020-07-02 Thread Mike Drob
Does it fail similarly on 8.5.0 and .1?

On Thu, Jul 2, 2020 at 6:38 AM Erick Erickson 
wrote:

> There have been some issues with Maven, see:
> https://issues.apache.org/jira/browse/LUCENE-9170
>
> However, we do not officially support Maven builds, they’re there as a
> convenience, so there may still
> be issues in future.
>
> > On Jul 2, 2020, at 1:27 AM, Ali Akhtar  wrote:
> >
> > If I try adding solr-core to an existing project, e.g (SBT):
> >
> > libraryDependencies += "org.apache.solr" % "solr-core" % "8.5.2"
> >
> > It fails due a 404 on the dependencies:
> >
> > Extracting structure failed
> > stack trace is suppressed; run last update for the full output
> > stack trace is suppressed; run last ssExtractDependencies for the full
> > output
> > (update) sbt.librarymanagement.ResolveException: Error downloading
> > org.restlet.jee:org.restlet:2.4.0
> > Not found
> > Not found
> > not found:
> > /home/ali/.ivy2/local/org.restlet.jee/org.restlet/2.4.0/ivys/ivy.xml
> > not found:
> >
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet/2.4.0/org.restlet-2.4.0.pom
> > Error downloading org.restlet.jee:org.restlet.ext.servlet:2.4.0
> > Not found
> > Not found
> > not found:
> >
> /home/ali/.ivy2/local/org.restlet.jee/org.restlet.ext.servlet/2.4.0/ivys/ivy.xml
> > not found:
> >
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.4.0/org.restlet.ext.servlet-2.4.0.pom
> > (ssExtractDependencies) sbt.librarymanagement.ResolveException: Error
> > downloading org.restlet.jee:org.restlet:2.4.0
> > Not found
> > Not found
> > not found:
> > /home/ali/.ivy2/local/org.restlet.jee/org.restlet/2.4.0/ivys/ivy.xml
> > not found:
> >
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet/2.4.0/org.restlet-2.4.0.pom
> > Error downloading org.restlet.jee:org.restlet.ext.servlet:2.4.0
> > Not found
> > Not found
> > not found:
> >
> /home/ali/.ivy2/local/org.restlet.jee/org.restlet.ext.servlet/2.4.0/ivys/ivy.xml
> > not found:
> >
> https://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.4.0/org.restlet.ext.servlet-2.4.0.pom
> >
> >
> >
> > Any ideas? Do I need to add a specific repository to get it to compile?
>
>


Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-24 Thread Mike Drob
Brend,

I appreciate that you are trying to examine this issue from multiple sides
and consider future implications, but I don’t think that is a stirring
argument. By analogy, if we are out of eggs and my wife asks me to go to
the store to get some, refusing to do so on the basis that she might call
me while I’m there and also ask me to get milk would not be reasonable.

What will come next may be an interesting question philosophically, but we
are not discussing abstract concepts here. There is a concrete issue
identified, and we’re soliciting input in how best to address it.

Thank you for the suggestion of "guide/follower"

Mike

On Wed, Jun 24, 2020 at 6:30 AM Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

> I'm following this thread now for a while and I can understand
> the wish to change some naming/wording/speech in one or the other
> programs but I always get back to the one question:
> "Is it the weapon which kills people or the hand controlled by
> the mind which fires the weapon?"
>
> The thread started with slave - slavery, then turned over to master
> and followed by leader (for me as a german... you know).
> What will come next?
>
> And more over, we now discuss about changes in the source code and
> due to this there need to be changes to the documentation.
> What about the books people wrote about this programs and source code,
> should we force this authors to rewrite their books?
> May be we should file a request to all web search engines to reject
> all stored content about these "banned" words?
> And contact all web hosters about providing bad content.
>
> To sum things up, within my 40 years of computer science and writing
> programs I have never had a nanosecond any thoughts about words
> like master, slave, leader, ... other than thinking about computers
> and programming.
>
> Just my 2 cents.
>
> For what it is worth, I tend to guide/follower if there "must be" any
> changes.
>
> Bernd
>


Re: [EXTERNAL] Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-18 Thread Mike Drob
I personally think that using Solr cloud terminology for this would be fine
with leader/follower. The leader is the one that accepts updates, followers
cascade the updates somehow. The presence of ZK or election doesn’t really
change this detail.

However, if folks feel that it’s confusing, then I can’t tell them that
they’re not confused. Especially when they’re working with others who have
less Solr experience than we do and are less familiar with the intricacies.

Primary/Replica seems acceptable. Coordinator instead of Overseer seems
acceptable.

Would love to see this in 9.0!

Mike

On Thu, Jun 18, 2020 at 8:25 AM John Gallagher
 wrote:

> While on the topic of renaming roles, I'd like to propose finding a better
> term than "overseer" which has historical slavery connotations as well.
> Director, perhaps?
>
>
> John Gallagher
>
> On Thu, Jun 18, 2020 at 8:48 AM Jason Gerlowski 
> wrote:
>
> > +1 to rename master/slave, and +1 to choosing terminology distinct
> > from what's used for SolrCloud.  I could be happy with several of the
> > proposed options.  Since a good few have been proposed though, maybe
> > an eventual vote thread is the most organized way to aggregate the
> > opinions here.
> >
> > I'm less positive about the prospect of changing the name of our
> > primary git branch.  Most projects that contributors might come from,
> > most tutorials out there to learn git, most tools built on top of git
> > - the majority are going to assume "master" as the main branch.  I
> > appreciate the change that Github is trying to effect in changing the
> > default for new projects, but it'll be a long time before that
> > competes with the huge bulk of projects, documentation, etc. out there
> > using "master".  Our contributors are smart and I'm sure they'd figure
> > it out if we used "main" or something else instead, but having a
> > non-standard git setup would be one more "papercut" in understanding
> > how to contribute to a project that already makes that harder than it
> > should.
> >
> > Jason
> >
> >
> > On Thu, Jun 18, 2020 at 7:33 AM Demian Katz 
> > wrote:
> > >
> > > Regarding people having a problem with the word "master" -- GitHub is
> > changing the default branch name away from "master," even in isolation
> from
> > a "slave" pairing... so the terminology seems to be falling out of favor
> in
> > all contexts. See:
> > >
> > >
> >
> https://www.cnet.com/news/microsofts-github-is-removing-coding-terms-like-master-and-slave/
> > >
> > > I'm not here to start a debate about the semantics of that, just to
> > provide evidence that in some communities, the term "master" is causing
> > concern all by itself. If we're going to make the change anyway, it might
> > be best to get it over with and pick the most appropriate terminology we
> > can agree upon, rather than trying to minimize the amount of change. It's
> > going to be backward breaking anyway, so we might as well do it all now
> > rather than risk having to go through two separate breaking changes at
> > different points in time.
> > >
> > > - Demian
> > >
> > > -Original Message-
> > > From: Noble Paul 
> > > Sent: Thursday, June 18, 2020 1:51 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: [EXTERNAL] Re: Getting rid of Master/Slave nomenclature in
> Solr
> > >
> > > Looking at the code I see a 692 occurrences of the word "slave".
> > > Mostly variable names and ref guide docs.
> > >
> > > The word "slave" is present in the responses as well. Any change in the
> > request param/response payload is backward incompatible.
> > >
> > > I have no objection to changing the names in ref guide and other
> > internal variables. Going ahead with backward incompatible changes is
> > painful. If somebody has the appetite to take it up, it's OK
> > >
> > > If we must change, master/follower can be a good enough option.
> > >
> > > master (noun): A man in charge of an organization or group.
> > > master(adj) : having or showing very great skill or proficiency.
> > > master(verb): acquire complete knowledge or skill in (a subject,
> > technique, or art).
> > > master (verb): gain control of; overcome.
> > >
> > > I hope nobody has a problem with the term "master"
> > >
> > > On Thu, Jun 18, 2020 at 3:19 PM Ilan Ginzburg 
> > wrote:
> > > >
> > > > Would master/follower work?
> > > >
> > > > Half the rename work while still getting rid of the slavery
> > connotation...
> > > >
> > > >
> > > > On Thu 18 Jun 2020 at 07:13, Walter Underwood  >
> > wrote:
> > > >
> > > > > > On Jun 17, 2020, at 4:00 PM, Shawn Heisey 
> > wrote:
> > > > > >
> > > > > > It has been interesting watching this discussion play out on
> > > > > > multiple
> > > > > open source mailing lists.  On other projects, I have seen a VERY
> > > > > high level of resistance to these changes, which I find disturbing
> > > > > and surprising.
> > > > >
> > > > > Yes, it is nice to see everyone just pitch in and do it on this
> list.
> > > > >
> > > > > wunder
> > > > > 

Re: Master Slave Terminology

2020-06-17 Thread Mike Drob
Hi Jan,

Can you link to the discussion? I searched the dev list and didn’t see
anything, is it on slack or a jira or somewhere else?

Mike

On Wed, Jun 17, 2020 at 1:51 AM Jan Høydahl  wrote:

> Hi Kaya,
>
> Thanks for bringing it up. The topic is already being discussed by
> developers, so expect to see some change in this area; Not over-night, but
> incremental.
> Also, if you want to lend a helping hand, patches are more than welcome as
> always.
>
> Jan
>
> > 17. jun. 2020 kl. 04:22 skrev Kayak28 :
> >
> > Hello, Community:
> >
> > As the Github and Python will replace terminologies that relative to
> > slavery,
> > why don't we replace master-slave for Solr as well?
> >
> > https://developers.srad.jp/story/18/09/14/0935201/
> >
> https://developer-tech.com/news/2020/jun/15/github-replace-slavery-terms-master-whitelist/
> >
> > --
> >
> > Sincerely,
> > Kaya
> > github: https://github.com/28kayak
>
>


[ANNOUNCE] Apache Solr 8.5.2 released

2020-05-26 Thread Mike Drob
26 May 2020, Apache Solr™ 8.5.2 available

The Lucene PMC is pleased to announce the release of Apache Solr 8.5.2

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.

This release contains two bug fixes. The release is available for immediate
download at:

The release is available for immediate download at:

https://lucene.apache.org/solr/downloads.html

Solr 8.5.2 Bug Fixes:

   - SOLR-14411 : Fix
   regression from SOLR-14359 (Admin UI 'Select an Option')
   - SOLR-14471 : base
   replica selection strategy not applied to "last place" shards.preference
   matches

Solr 8.5.2 also includes 1 bugfix in the corresponding Apache Lucene
release:



Please report any feedback to the mailing lists (
https://lucene.apache.org/solr/community.html#mailing-lists-irc)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also goes for Maven access.


Re: Solr 8.5.1 startup error - lengthTag=109, too big.

2020-05-26 Thread Mike Drob
Did you have SSL enabled with 8.2.1?

The error looks common to certificate handling and not specific to Solr.

I would verify that you have no extra characters in your certificate file
(including line endings) and that the keystore type that you specified
matches the file you are presenting (JKS or PKCS12)

Mike

On Sat, May 23, 2020 at 10:11 PM Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I'm trying to upgrade from Solr 8.2.1 to Solr 8.5.1, with Solr SSL
> Authentication and Authorization.
>
> However, I get the following error when I enable SSL. The Solr itself can
> start up if there is no SSL.  The main error that I see is this
>
>   java.io.IOException: DerInputStream.getLength(): lengthTag=109, too big.
>
> What could be the reason that causes this?
>
>
> INFO  - 2020-05-24 10:38:20.080;
> org.apache.solr.util.configuration.SSLConfigurations; Setting
> javax.net.ssl.keyStorePassword
> INFO  - 2020-05-24 10:38:20.081;
> org.apache.solr.util.configuration.SSLConfigurations; Setting
> javax.net.ssl.trustStorePassword
> Waiting up to 120 to see Solr running on port 8983
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at org.eclipse.jetty.start.Main.invokeMain(Main.java:218)
> at org.eclipse.jetty.start.Main.start(Main.java:491)
> at org.eclipse.jetty.start.Main.main(Main.java:77)d
> Caused by: java.security.PrivilegedActionException: java.io.IOException:
> DerInputStream.getLength(): lengthTag=109, too big.
> at java.security.AccessController.doPrivileged(Native Method)
> at
> org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1837)
> ... 7 more
> Caused by: java.io.IOException: DerInputStream.getLength(): lengthTag=109,
> too big.
> at sun.security.util.DerInputStream.getLength(Unknown Source)
> at sun.security.util.DerValue.init(Unknown Source)
> at sun.security.util.DerValue.(Unknown Source)
> at sun.security.util.DerValue.(Unknown Source)
> at sun.security.pkcs12.PKCS12KeyStore.engineLoad(Unknown Source)
> at java.security.KeyStore.load(Unknown Source)
> at
>
> org.eclipse.jetty.util.security.CertificateUtils.getKeyStore(CertificateUtils.java:54)
> at
>
> org.eclipse.jetty.util.ssl.SslContextFactory.loadKeyStore(SslContextFactory.java:1188)
> at
>
> org.eclipse.jetty.util.ssl.SslContextFactory.load(SslContextFactory.java:323)
> at
>
> org.eclipse.jetty.util.ssl.SslContextFactory.doStart(SslContextFactory.java:245)
> at
>
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at
>
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
> at
>
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
> at
>
> org.eclipse.jetty.server.SslConnectionFactory.doStart(SslConnectionFactory.java:92)
> at
>
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at
>
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
> at
>
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
> at
>
> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:320)
> at
>
> org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:81)
> at
> org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:231)
> at
>
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at org.eclipse.jetty.server.Server.doStart(Server.java:385)
> at
>
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
> at
>
> org.eclipse.jetty.xml.XmlConfiguration.lambda$main$0(XmlConfiguration.java:1888)
> ... 9 more
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at org.eclipse.jetty.start.Main.invokeMain(Main.java:218)
> at org.eclipse.jetty.start.Main.start(Main.java:491)
> at org.eclipse.jetty.start.Main.main(Main.java:77)
> Caused by: java.security.PrivilegedActionException: java.io.IOException:
> DerInputStream.getLength(): lengthTag=109, too big.
> at java.security.AccessController.doPrivileged(Native Method)
> at
> 

Re: Download a pre-release version? 8.6

2020-05-15 Thread Mike Drob
We could theoretically include this in a 8.5.2 version which should be
released soon. The change looks minimally risky to backport?

On Fri, May 15, 2020 at 3:43 PM Jan Høydahl  wrote:

> Check Jenkins:
> https://builds.apache.org/view/L/view/Lucene/job/Solr-Artifacts-8.x/lastSuccessfulBuild/artifact/solr/package/
>
> Jan Høydahl
>
> > 15. mai 2020 kl. 22:27 skrev Phill Campbell
> :
> >
> > Is there a way to download a tgz of the binary of a nightly build or
> similar?
> >
> > I have been testing 8.5.1 and ran into the bug with load balancing.
> > https://issues.apache.org/jira/browse/SOLR-14471 <
> https://issues.apache.org/jira/browse/SOLR-14471>
> >
> > It is a deal breaker for me to move forward with an upgrade of the
> system.
> >
> > I would like to start evaluating a version that has the fix.
> >
> > Is there a place to get a build?
> >
> > Thank you.
>


Re: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-05-01 Thread Mike Drob
This is how things get stemmed *now*, but I believe there is an open
question as to whether that is how they *should* be stemmed. Specifically,
the case appears to be -ify words not stemming to the same as -ification -
this applies to much more than identify/identification. Also, justify,
fortify, notify, many many others.

$ grep ification /usr/share/dict/words | wc -l
 328

I am by no means an expert on stemming, and if the folks at snowball decide
to tell us that this change is bad or hard because it would overstem some
other words, then I'll happily accept that. But I definitely want to use
their expertise rather than relying on my own.

Mike

On Fri, May 1, 2020 at 10:35 AM Audrey Lorberfeld -
audrey.lorberf...@ibm.com  wrote:

> Unless I'm misunderstanding the bug in question, there is no bug. What you
> are observing is simply just how things get stemmed...
>
> Best,
> Audrey
>
> On 4/30/20, 6:37 PM, "Jhonny Lopez" 
> wrote:
>
> Yes, sounds like worth it.
>
> Thanks guys!
>
> -Original Message-
> From: Mike Drob 
> Sent: jueves, 30 de abril de 2020 5:30 p. m.
> To: solr-user@lucene.apache.org
> Subject: Re: Possible issue with Stemming and nouns ended with suffix
> 'ion'
>
> This email has been sent from a source external to Publicis Groupe.
> Please use caution when clicking links or opening attachments.
> Cet email a été envoyé depuis une source externe à Publicis Groupe.
> Veuillez faire preuve de prudence lorsque vous cliquez sur des liens ou
> lorsque vous ouvrez des pièces jointes.
>
>
>
> Is this worth filing a bug/suggestion to the folks over at
> snowballstem.org?
>
> On Thu, Apr 30, 2020 at 4:08 PM Audrey Lorberfeld -
> audrey.lorberf...@ibm.com  wrote:
>
> > I agree with Erick. I think that's just how the cookie crumbles when
> > stemming. If you have some time on your hands, you can integrate
> > OpenNLP with your Solr instance and start using the lemmas of tokens
> > instead of the stems. In this case, I believe if you were to
> lemmatize
> > both "identify" and "identification," they would both condense to
> "identify."
> >
> > Best,
> > Audrey
> >
> > On 4/30/20, 3:54 PM, "Erick Erickson" 
> wrote:
> >
> > They are being stemmed to two different tokens, “identif” and
> > “identifi”. Stemming is algorithmic and imperfect and in this case
> > you’re getting bitten by that algorithm. It looks like you’re using
> > PorterStemFilter, if you want you can look up the exact algorithm,
> but
> > I don’t think it’s a bug, just one of those little joys of English...
> >
> > To get a clearer picture of exactly what’s being searched, try
> > adding =query to your query, in particular looking at the
> parsed
> > query that’s returned. That’ll tell you a bunch. In this particular
> > case I don’t think it’ll tell you anything more, but for future…
> >
> > Best,
> > Erick
> >
> > On, and un-checking the ‘verbose’ box on the analysis page
> removes
> > a lot of distraction, the detailed information is often TMI ;)
> >
> > > On Apr 30, 2020, at 2:51 PM, Jhonny Lopez <
> > jhonny.lo...@publicismedia.com> wrote:
> > >
> > > Sure, rewriting the message with links for images:
> > >
> > >
> > > We’re facing an issue with stemming in solr. Most of the cases
> > are working correctly, for example, if we search for bidding, solr
> > brings results for bidding, bid, bids, etc. However, with nouns
> ended with ‘ion’
> > suffix, stemming is not working. Even when analyzers seems to have
> > correct stemming of the word, the results are not reflecting that.
> One
> > example. If I search ‘identifying’, this is the output:
> > >
> > > Analyzer (image link):
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd4-2DCp40Cmc0QioS0A-3Fe-3D1f3GJp=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s=U-Wmu118X5bfNxDnADO_6ompf9kUxZYHj1DZM2lG4jo=
> > >
> > > A clip of results:
> > > "haschildren_b":false,
> > >"isbucket_text_s":"0",
> > >"sectionbody_t":"\n\n\nIn order to identify 1st price
> > auctions, leverage the

Re: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-05-01 Thread Mike Drob
Jhonny,

Are you planning on reporting the issue to snowball, or would you prefer
one of us take care of it?
If you do report it, please share the link to the issue or mail archive
back here so that we know when it is resolved and can update our
dependencies.

Thanks,
Mike

On Thu, Apr 30, 2020 at 5:37 PM Jhonny Lopez 
wrote:

> Yes, sounds like worth it.
>
> Thanks guys!
>
> -Original Message-----
> From: Mike Drob 
> Sent: jueves, 30 de abril de 2020 5:30 p. m.
> To: solr-user@lucene.apache.org
> Subject: Re: Possible issue with Stemming and nouns ended with suffix 'ion'
>
> This email has been sent from a source external to Publicis Groupe. Please
> use caution when clicking links or opening attachments.
> Cet email a été envoyé depuis une source externe à Publicis Groupe.
> Veuillez faire preuve de prudence lorsque vous cliquez sur des liens ou
> lorsque vous ouvrez des pièces jointes.
>
>
>
> Is this worth filing a bug/suggestion to the folks over at
> snowballstem.org?
>
> On Thu, Apr 30, 2020 at 4:08 PM Audrey Lorberfeld -
> audrey.lorberf...@ibm.com  wrote:
>
> > I agree with Erick. I think that's just how the cookie crumbles when
> > stemming. If you have some time on your hands, you can integrate
> > OpenNLP with your Solr instance and start using the lemmas of tokens
> > instead of the stems. In this case, I believe if you were to lemmatize
> > both "identify" and "identification," they would both condense to
> "identify."
> >
> > Best,
> > Audrey
> >
> > On 4/30/20, 3:54 PM, "Erick Erickson"  wrote:
> >
> > They are being stemmed to two different tokens, “identif” and
> > “identifi”. Stemming is algorithmic and imperfect and in this case
> > you’re getting bitten by that algorithm. It looks like you’re using
> > PorterStemFilter, if you want you can look up the exact algorithm, but
> > I don’t think it’s a bug, just one of those little joys of English...
> >
> > To get a clearer picture of exactly what’s being searched, try
> > adding =query to your query, in particular looking at the parsed
> > query that’s returned. That’ll tell you a bunch. In this particular
> > case I don’t think it’ll tell you anything more, but for future…
> >
> > Best,
> > Erick
> >
> > On, and un-checking the ‘verbose’ box on the analysis page removes
> > a lot of distraction, the detailed information is often TMI ;)
> >
> > > On Apr 30, 2020, at 2:51 PM, Jhonny Lopez <
> > jhonny.lo...@publicismedia.com> wrote:
> > >
> > > Sure, rewriting the message with links for images:
> > >
> > >
> > > We’re facing an issue with stemming in solr. Most of the cases
> > are working correctly, for example, if we search for bidding, solr
> > brings results for bidding, bid, bids, etc. However, with nouns ended
> with ‘ion’
> > suffix, stemming is not working. Even when analyzers seems to have
> > correct stemming of the word, the results are not reflecting that. One
> > example. If I search ‘identifying’, this is the output:
> > >
> > > Analyzer (image link):
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd4-2DCp40Cmc0QioS0A-3Fe-3D1f3GJp=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s=U-Wmu118X5bfNxDnADO_6ompf9kUxZYHj1DZM2lG4jo=
> > >
> > > A clip of results:
> > > "haschildren_b":false,
> > >"isbucket_text_s":"0",
> > >"sectionbody_t":"\n\n\nIn order to identify 1st price
> > auctions, leverage the proprietary tools available or manually pull a
> > log file report to understand the trends and gauge auction spread
> > overtime to assess the impact of variable auction
> dynamics.\n\n\n\n\n\n\n",
> > >"parsedupdatedby_s":"sitecorecarvaini",
> > >"sectionbody_t_en":"\n\n\nIn order to identify 1st price
> > auctions, leverage the proprietary tools available or manually pull a
> > log file report to understand the trends and gauge auction spread
> > overtime to assess the impact of variable auction
> dynamics.\n\n\n\n\n\n\n",
> > >"hide_section_b":false
> > >
> > >
> > > As you can see, it has used the stemming correctly and brings
> > results for other words based in the root, in this case “Identify”.
> > >
> 

Re: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-04-30 Thread Mike Drob
Is this worth filing a bug/suggestion to the folks over at snowballstem.org?

On Thu, Apr 30, 2020 at 4:08 PM Audrey Lorberfeld -
audrey.lorberf...@ibm.com  wrote:

> I agree with Erick. I think that's just how the cookie crumbles when
> stemming. If you have some time on your hands, you can integrate OpenNLP
> with your Solr instance and start using the lemmas of tokens instead of the
> stems. In this case, I believe if you were to lemmatize both "identify" and
> "identification," they would both condense to "identify."
>
> Best,
> Audrey
>
> On 4/30/20, 3:54 PM, "Erick Erickson"  wrote:
>
> They are being stemmed to two different tokens, “identif” and
> “identifi”. Stemming is algorithmic and imperfect and in this case you’re
> getting bitten by that algorithm. It looks like you’re using
> PorterStemFilter, if you want you can look up the exact algorithm, but I
> don’t think it’s a bug, just one of those little joys of English...
>
> To get a clearer picture of exactly what’s being searched, try adding
> =query to your query, in particular looking at the parsed query
> that’s returned. That’ll tell you a bunch. In this particular case I don’t
> think it’ll tell you anything more, but for future…
>
> Best,
> Erick
>
> On, and un-checking the ‘verbose’ box on the analysis page removes a
> lot of distraction, the detailed information is often TMI ;)
>
> > On Apr 30, 2020, at 2:51 PM, Jhonny Lopez <
> jhonny.lo...@publicismedia.com> wrote:
> >
> > Sure, rewriting the message with links for images:
> >
> >
> > We’re facing an issue with stemming in solr. Most of the cases are
> working correctly, for example, if we search for bidding, solr brings
> results for bidding, bid, bids, etc. However, with nouns ended with ‘ion’
> suffix, stemming is not working. Even when analyzers seems to have correct
> stemming of the word, the results are not reflecting that. One example. If
> I search ‘identifying’, this is the output:
> >
> > Analyzer (image link):
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd4-2DCp40Cmc0QioS0A-3Fe-3D1f3GJp=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s=U-Wmu118X5bfNxDnADO_6ompf9kUxZYHj1DZM2lG4jo=
> >
> > A clip of results:
> > "haschildren_b":false,
> >"isbucket_text_s":"0",
> >"sectionbody_t":"\n\n\nIn order to identify 1st price
> auctions, leverage the proprietary tools available or manually pull a log
> file report to understand the trends and gauge auction spread overtime to
> assess the impact of variable auction dynamics.\n\n\n\n\n\n\n",
> >"parsedupdatedby_s":"sitecorecarvaini",
> >"sectionbody_t_en":"\n\n\nIn order to identify 1st price
> auctions, leverage the proprietary tools available or manually pull a log
> file report to understand the trends and gauge auction spread overtime to
> assess the impact of variable auction dynamics.\n\n\n\n\n\n\n",
> >"hide_section_b":false
> >
> >
> > As you can see, it has used the stemming correctly and brings
> results for other words based in the root, in this case “Identify”.
> >
> > However, if I search for “Identification”, this is the output:
> >
> > Analyzer (imagelink):
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__1drv.ms_u_s-21AlRTlFq8tQbShd49RpiQObzMgSjVhA=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=8Xt1N2A4ODj--DlLb242c8JMnJr6nWQIwcKjiDiA__s=5RlkLH-90sYc4nyIgnPO9MsBlyh7iWSOphEVdjUvTIE=
> >
> >
> > Even with proper stemming, solr is only bringing results for the
> word identification (or identifications) but nothing else.
> >
> > The queries are over the same field that has the Porter Stemming
> Filter applied for both, query and index. This behavior is consistent with
> other ‘ion’ ended nouns: representation, modification, etc.
> >
> > Solr Version: 8.1. Does anyone know why is it happening? Is it a bug?
> >
> > Thanks.
> >
> >
> >
> >
> >
> > -Original Message-
> >
> > From: Erick Erickson 
> >
> > Sent: jueves, 30 de abril de 2020 1:47 p. m.
> >
> > To: solr-user@lucene.apache.org
> >
> > Subject: Re: Possible issue with Stemming and nouns ended with
> suffix 'ion'
> >
> >
> >
> > This email has been sent from a source external to Publicis Groupe.
> Please use caution when clicking links or opening attachments.
> >
> > Cet email a été envoyé depuis une source externe à Publicis Groupe.
> Veuillez faire preuve de prudence lorsque vous cliquez sur des liens ou
> lorsque vous ouvrez des pièces jointes.
> >
> >
> >
> >
> >
> >
> >
> > The mail server is pretty aggressive about stripping links, so we
> can’t see the images.
> >
> >
> >
> > Could you put 

Re: Fuzzy search not working

2020-04-14 Thread Mike Drob
Pradeep,

First, some background on fuzzy term expansions:

1) A query for foobar~2 is really a query for (foobar OR foobar~1 OR
foobar~2)
2) Fuzzy term expansion will only take the first 50 terms found in the
index and drop the rest.

For implementation notes, see this comment -
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/FuzzyTermsEnum.java#L229-L232

So in your first search, your available terms in the title_txt_en field are
few enough that "probl~2" does match "problem"
In your second search, with the copy field, there are likely many more
terms in all_text_txt_enus. Here, the edit distance 1 terms crowd out the
edit distance 2 terms and they never match.
You can imagine that the term expands into… "probl OR prob OR probe OR
prob1 OR…"

I don't see a way to specify the number of expansions from a Solr query,
maybe somebody else on the list would know.

But at the end of the day, like Wunder said, you might want a prefix query
based on what you're describing.

Mike


On Mon, Apr 13, 2020 at 6:01 PM Deepu  wrote:

> Corrected Typo mistake.
>
> Hi Team,
>
> We have 8 text fields (*_txt_en) in schema and one multi valued text field
> which is copy field of other text fields, like below.
>
> tittle_txt_en, configuration_summary_txt_en, all_text_txt_ens (multi value
> field)
>
> Observed one issue with Fuzzy match, same term with distance of two(~2) is
> working on individual fields but not returning any results from multi
> valued field.
>
> Term we used is "probl" and document has "problem" term in two text fields,
> so all_text field has two occurrences of 'problem" terms.
>
>
>
> title_txt_en:probl~2. (given results)
>
> all_text_txt_ens:probl~2 (no results)
>
>
>
> is there any other factors involved in distance calculation other
> than Damerau-Levenshtein Distance algoritham?
>
> what might be the reason same input with same distance worked with one
> field and failed with other field in same collection?
>
> is there a way we can get actual distance solr calculated w.r.t specific
> document and specific field ?
>
>
>
> Thanks in advance !!
>
>
> Thanks,
>
> Pradeep
>
> On Mon, Apr 13, 2020 at 2:35 PM Deepu  wrote:
>
> > Hi Team,
> >
> > We have 8 text fields (*_txt_en) in schema and one multi valued text
> field
> > which is copy field of other text fields, like below.
> >
> > tittle_txt_en, configuration_summary_txt_en, all_text_txt_ens (multi
> value
> > field)
> >
> > Observed one issue with Fuzzy match, same term with distance of two(~2)
> is
> > working on individual fields but not returning any results from multi
> > valued field.
> >
> > Term we used is "prob" and document has "problem" term in two text
> fields,
> > so all_text field has two occurrences of 'problem" terms.
> >
> >
> >
> > title_txt_en:prob~2. (given results)
> >
> > all_text_txt_ens:prob~2 (no results)
> >
> >
> >
> > is there any other factors involved in distance calculation other
> > than Damerau-Levenshtein Distance algoritham?
> >
> > what might be the reason same input with same distance worked with one
> > field and failed with other field in same collection?
> >
> > is there a way we can get actual distance solr calculated w.r.t specific
> > document and specific field ?
> >
> >
> >
> > Thanks in advance !!
> >
> >
> > Thanks,
> >
> > Pradeep
> >
>


SolrCloud location for solr.xml

2020-02-28 Thread Mike Drob
Hi Searchers!

I was recently looking at some of the start-up logic for Solr and was
interested in cleaning it up a little bit. However, I'm not sure how common
certain deployment scenarios are. Specifically is anybody doing the
following combination:

* Using SolrCloud (i.e. state stored in zookeeper)
* Loading solr.xml from a local solr home rather than zookeeper

Much appreciated! Thanks,
Mike


Re: Outdated information on JVM heap sizes in Solr 8.3 documentation?

2020-02-15 Thread Mike Drob
Erick,

Can you drop a link to that Jira here after you create it?

Many thanks,
Mike

On Fri, Feb 14, 2020 at 6:05 PM Erick Erickson 
wrote:

> I just read that page over and it looks way out of date. I’ll raise
> a JIRA.
>
> > On Feb 14, 2020, at 2:55 PM, Walter Underwood 
> wrote:
> >
> > Yeah, that is pretty outdated. At Netflix, I was running an 8 GB heap
> with Solr 1.3. :-)
> >
> > Every GC I know about has a stop-the-world collector as a last ditch
> measure.
> >
> > G1GC limits the time that the world will stop. It gives up after
> MaxGCPauseMillis
> > milliseconds and leaves the rest of the garbage uncollected. If it has 5
> seconds
> > worth of work to do that, it might take 10 seconds, but in 200 ms
> chunks. It does
> > a lot of other stuff outside of the pauses to make the major collections
> more effective.
> >
> > We wrote Ultraseek in Python+C because Python used reference counting and
> > did not do garbage collection. That is the only way to have no pauses
> with
> > automatic memory management.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On Feb 14, 2020, at 11:35 AM, Tom Burton-West 
> wrote:
> >>
> >> Hello,
> >>
> >> In the section on JVM tuning in the  Solr 8.3 documentation (
> >> https://lucene.apache.org/solr/guide/8_3/jvm-settings.html#jvm-settings
> )
> >> there is a paragraph which cautions about setting heap sizes over 2 GB:
> >>
> >> "The larger the heap the longer it takes to do garbage collection. This
> can
> >> mean minor, random pauses or, in extreme cases, "freeze the world"
> pauses
> >> of a minute or more. As a practical matter, this can become a serious
> >> problem for heap sizes that exceed about **two gigabytes**, even if far
> >> more physical memory is available. On robust hardware, you may get
> better
> >> results running multiple JVMs, rather than just one with a large memory
> >> heap. "  (** added by me)
> >>
> >> I suspect this paragraph is severely outdated, but am not a Java expert.
> >> It seems to be contradicted by the statement in "
> >>
> https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#memory-and-gc-settings
> "
> >> "...values between 10 and 20 gigabytes are not uncommon for production
> >> servers"
> >>
> >> Are "freeze the world" pauses still an issue with modern JVM's?
> >> Is it still advisable to avoid heap sizes over 2GB?
> >>
> >> Tom
> >> https://www.hathitrust.org/blogslarge-scale-search
> >
>
>


Re: Modify partial configsets using API

2019-05-08 Thread Mike Drob



On 2019/05/08 16:52:52, Shawn Heisey  wrote: 
> On 5/8/2019 10:50 AM, Mike Drob wrote:
> > Solr Experts,
> > 
> > Is there an existing API to modify just part of my configset, for example
> > synonyms or stopwords? I see that there is the schema API, but that is
> > pretty specific in scope.
> > 
> > Not sure if I should be looking at configset API to upload a zip with a
> > single file, or if there are more granular options available.
> 
> Here's a documentation link for managed resources:
> 
> https://lucene.apache.org/solr/guide/6_6/managed-resources.html
> 
> That's the 6.6 version of the documentation.  If you're running 
> something newer, which seems likely since 6.6 is quite old now, you 
> might want to look into a later documentation version.
> 
> Thanks,
> Shawn
> 

Thanks Shawn, this looks like it will fit the bill nicely!

One more question that I don't see covered in the documentation - if I have 
multiple collections sharing the same config set, does updating the managed 
stop words for one collection apply the change to all? Is this change persisted 
in zookeeper?

Mike


Modify partial configsets using API

2019-05-08 Thread Mike Drob
Solr Experts,

Is there an existing API to modify just part of my configset, for example
synonyms or stopwords? I see that there is the schema API, but that is
pretty specific in scope.

Not sure if I should be looking at configset API to upload a zip with a
single file, or if there are more granular options available.

Thanks,
Mike


Re: zero-day exploit security issue

2017-10-16 Thread Mike Drob
Given that the already public nature of the disclosure, does it make sense
to make the work being done public prior to release as well?

Normally security fixes are kept private while the vulnerabilities are
private, but that's not the case here...

On Mon, Oct 16, 2017 at 1:20 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Yes, there is but it is private i.e. only the Apache Lucene PMC
> members can see it. This is standard for all security issues in Apache
> land. The fixes for this issue has been applied to the release
> branches and the Solr 7.1.0 release candidate is already up for vote.
> Barring any unforeseen circumstances, a 7.1.0 release with the fixes
> should be expected this week.
>
> On Fri, Oct 13, 2017 at 8:14 PM, Xie, Sean  wrote:
> > Is there a tracking to address this issue for SOLR 6.6.x and 7.x?
> >
> > https://lucene.apache.org/solr/news.html#12-october-
> 2017-please-secure-your-apache-solr-servers-since-a-
> zero-day-exploit-has-been-reported-on-a-public-mailing-list
> >
> > Sean
> >
> > Confidentiality Notice::  This email, including attachments, may include
> non-public, proprietary, confidential or legally privileged information.
> If you are not an intended recipient or an authorized agent of an intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of the information contained in or transmitted with this e-mail is
> unauthorized and strictly prohibited.  If you have received this email in
> error, please notify the sender by replying to this message and permanently
> delete this e-mail, its attachments, and any copies of it immediately.  You
> should not retain, copy or use this e-mail or any attachment for any
> purpose, nor disclose all or any part of the contents to any other person.
> Thank you.
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Two separate instances sharing the same zookeeper cluster

2017-09-14 Thread Mike Drob
When you specify the zk string for a solr instance, you typically include a
chroot in it. I think the default is /solr, but it doesn't have to be, so
you should be able to run with -z zk1:2181/sorl-dev and /solr-prod

https://lucene.apache.org/solr/guide/6_6/setting-up-an-external-zookeeper-ensemble.html#SettingUpanExternalZooKeeperEnsemble-PointSolrattheinstance

On Thu, Sep 14, 2017 at 3:01 PM, James Keeney  wrote:

> I have a staging and a production solr cluster. I'd like to have them use
> the same zookeeper cluster. It seems like it is possible if I can set a
> different directory for the second cluster. I've looked through the
> documentation though and I can't quite figure out where to set that up. As
> a result my staging cluster nodes keep trying to add themselves tot he
> production cluster.
>
> If someone could point me in the right direction?
>
> Jim K.
> --
> Jim Keeney
> President, FitterWeb
> E: j...@fitterweb.com
> M: 703-568-5887
>
> *FitterWeb Consulting*
> *Are you lean and agile enough? *
>


Re: IndexReaders cannot exceed 2 Billion

2017-08-08 Thread Mike Drob
> I have no idea whether you can successfully recover anything from that
> index now that it has broken the hard limit.

Theoretically, I think it's possible with some very surgical edits.
However, I've tried to do this in the past and abandoned it. The code to
split the index needs to be able to open it first, so we reasoned that we'd
have no way to demonstrate correctness and at that point restoring from a
backup was the best option.

Maybe somebody smarter or more determined has a better experience.

Mike

On Tue, Aug 8, 2017 at 10:21 AM, Shawn Heisey  wrote:

> On 8/7/2017 9:41 AM, Wael Kader wrote:
> > I faced an issue that is making me go crazy.
> > I am running SOLR saving data on HDFS and I have a single node setup with
> > an index that has been running fine until today.
> > I know that 2 billion documents is too much on a single node but it has
> > been running fine for my requirements and it was pretty fast.
> >
> > I restarted SOLR today and I am getting an error stating "Too many
> > documents, composite IndexReaders cannot exceed 2147483519.
> > The last backup I have is 2 weeks back and I really need the index to
> start
> > to get the data from the index.
>
> You have run into what I think might be the only *hard* limit in the
> entire Lucene ecosystem.  Other limits can usually be broken with
> careful programming, but that one is set in stone.
>
> A Lucene index uses a 32-bit Java integer to track the internal document
> ID.  In Java, numeric variables are signed.  For that reason, an integer
> cannot exceed (2^31)-1.  That number is 2147483647.  It appears that
> Lucene cuts that off at a value that's smaller by 128.  Not sure why
> that is, but it's probably to prevent problems when a small offset is
> added to the value.
>
> SolrCloud is perfectly capable of running indexes with far more than two
> billion documents, but as Yago mentioned, the collection must be sharded
> for that to happen.
>
> I have no idea whether you can successfully recover anything from that
> index now that it has broken the hard limit.
>
> Thanks,
> Shawn
>
>


Re: Solr Cloud 6.x - rollback best practice

2017-07-12 Thread Mike Drob
The two collection approach with aliasing is a good approach.

You can also use the backup and restore APIs -
https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html

Mike

On Wed, Jul 12, 2017 at 10:57 AM, Vincenzo D'Amore 
wrote:

> Hi,
>
> I'm moving to Solr Cloud 6.x and I see rollback cannot be supported when is
> in Cloud mode.
>
> In my scenario, there are basically two tasks (full indexing, partial
> indexing).
>
> Full indexing
> =
>
> This is the most important case, where I really need the possibility to
> rollback.
>
> The full reindex is basically done in 3 steps:
>
> 1. delete *:* all collection's documents
> 2. add all existing documents
> 3. commit
>
> If during the step 2 something go wrong (usually some problem with the
> source of data) I had to rollback.
>
> Partial reindexing
> =
>
> Unlike the the former, this case is executed in only 2 steps (no delete)
> and the number of documents indexed usually is small (or very small).
>
> Even in this case if the step 2 go wrong I had to rollback.
>
> Do you know if there is a common pattern, a best practice, something of
> useful to handle a rollback if something go wrong in these cases?
>
> My simplistic idea is to have two collections (active/passive), and switch
> from one to another only when all the steps are completed successfully.
>
> But, as you can understand, having two collections works well with full
> indexing, but how do I handle a partial reindexing if something goes wrong?
>
> So, I'll be grateful to whom would spend his/her time to give me a
> suggestion.
>
> Thanks in advance and best regards,
> Vincenzo
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>


Re: (how) do folks use the Cloud Graph (Radial) in the Solr Admin UI?

2017-06-16 Thread Mike Drob
+solr-user

Might get a different audience on this list.

-- Forwarded message --
From: Christine Poerschke (BLOOMBERG/ LONDON) 
Date: Fri, Jun 16, 2017 at 11:43 AM
Subject: (how) do folks use the Cloud Graph (Radial) in the Solr Admin UI?
To: d...@lucene.apache.org


Any thoughts on potentially removing the radial cloud graph?

https://issues.apache.org/jira/browse/SOLR-5405 is the background for the
question and further input and views would be very welcome.

Thanks,
Christine
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


Re: Can solrcloud be running on a read-only filesystem?

2017-06-02 Thread Mike Drob
To throw out one possibility, a read only file systems has no (low?)
possibility of corruption. If you have a static index then you shouldn't
need to be doing any recovery. Would still need to run ZK with RW
filesystem, but mybe Solr could work?

On Fri, Jun 2, 2017 at 10:15 AM, Erick Erickson 
wrote:

> As Susheel says, this is iffy, very iffy. You can disable tlogs
> entirely through solrconfig.xml, you can _probably_
> disable all of the Solr logging.
>
> You'd also have to _not_ run in SolrCloud. You say
> "some of the nodes eventually are stuck in the recovering phase"
> SolrCloud tries very hard to keep all of the replicas in sync.
> To do this it _must_ be able to copy from the leader to the follower.
> If it ever has to sync with the leader, it'll be stuck in recovery
> as you can see.
>
> You could spend a lot of time trying to make this work, but
> you haven't stated _why_ you want to. Perhaps there are
> other ways to get the functionality you want.
>
> Best,
> Erick
>
> On Fri, Jun 2, 2017 at 5:05 AM, Susheel Kumar 
> wrote:
> > I doubt it can run in readonly file system.  Even though there is no
> > ingestion etc.  Solr still needs to write to logs/tlogs for synching /
> > recovering etc
> >
> > Thnx
> >
> > On Fri, Jun 2, 2017 at 6:56 AM, Wudong Liu  wrote:
> >
> >> Hi All:
> >>
> >> We have a normal build/stage -> prod settings for our production
> pipeline.
> >> And we would build solr index in the build environment and then the
> index
> >> is copied to the prod environment.
> >>
> >> The solrcloud in prod seems working fine when the file system backing
> it is
> >> writable. However, we see many errors when the file system is readonly.
> >> Many exceptions are thrown regarding the tlog file cannot be open for
> write
> >> when the solr nodes are restarted with the new data; some of the nodes
> >> eventually are stuck in the recovering phase and never able to go back
> >> online in the cloud.
> >>
> >> Just wondering is anyone has any experience on Solrcloud running in
> >> readonly file system? Is it possible at all?
> >>
> >> Regards,
> >> Wudong
> >>
>


Re: Solr Web Crawler - Robots.txt

2017-06-01 Thread Mike Drob
Isn't this exactly what Apache Nutch was built for?

On Thu, Jun 1, 2017 at 6:56 PM, David Choi  wrote:

> In any case after digging further I have found where it checks for
> robots.txt. Thanks!
>
> On Thu, Jun 1, 2017 at 5:34 PM Walter Underwood 
> wrote:
>
> > Which was exactly what I suggested.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> > > On Jun 1, 2017, at 3:31 PM, David Choi  wrote:
> > >
> > > In the mean time I have found a better solution at the moment is to
> test
> > on
> > > a site that allows users to crawl their site.
> > >
> > > On Thu, Jun 1, 2017 at 5:26 PM David Choi 
> > wrote:
> > >
> > >> I think you misunderstand the argument was about stealing content.
> Sorry
> > >> but I think you need to read what people write before making bold
> > >> statements.
> > >>
> > >> On Thu, Jun 1, 2017 at 5:20 PM Walter Underwood <
> wun...@wunderwood.org>
> > >> wrote:
> > >>
> > >>> Let’s not get snarky right away, especially when you are wrong.
> > >>>
> > >>> Corporations do not generally ignore robots.txt. I worked on a
> > commercial
> > >>> web spider for ten years. Occasionally, our customers did need to
> > bypass
> > >>> portions of robots.txt. That was usually because of a
> > poorly-maintained web
> > >>> server, or because our spider could safely crawl some content that
> > would
> > >>> cause problems for other crawlers.
> > >>>
> > >>> If you want to learn crawling, don’t start by breaking the
> conventions
> > of
> > >>> good web citizenship. Instead, start with sitemap.xml and crawl the
> > >>> preferred portions of a site.
> > >>>
> > >>> https://www.sitemaps.org/index.html <
> > https://www.sitemaps.org/index.html>
> > >>>
> > >>> If the site blocks you, find a different site to learn on.
> > >>>
> > >>> I like the looks of “Scrapy”, written in Python. I haven’t used it
> for
> > >>> anything big, but I’d start with that for learning.
> > >>>
> > >>> https://scrapy.org/ 
> > >>>
> > >>> If you want to learn on a site with a lot of content, try ours,
> > chegg.com
> > >>> But if your crawler gets out of hand, crawling too fast, we’ll block
> > it.
> > >>> Any other site will do the same.
> > >>>
> > >>> I would not base the crawler directly on Solr. A crawler needs a
> > >>> dedicated database to record the URLs visited, errors, duplicates,
> > etc. The
> > >>> output of the crawl goes to Solr. That is how we did it with
> Ultraseek
> > >>> (before Solr existed).
> > >>>
> > >>> wunder
> > >>> Walter Underwood
> > >>> wun...@wunderwood.org
> > >>> http://observer.wunderwood.org/  (my blog)
> > >>>
> > >>>
> >  On Jun 1, 2017, at 3:01 PM, David Choi 
> > wrote:
> > 
> >  Oh well I guess its ok if a corporation does it but not someone
> > wanting
> > >>> to
> >  learn more about the field. I actually have written a crawler before
> > as
> >  well as the you know Inverted Index of how solr works but I just
> > thought
> >  its architecture was better suited for scaling.
> > 
> >  On Thu, Jun 1, 2017 at 4:47 PM Dave 
> > >>> wrote:
> > 
> > > And I mean that in the context of stealing content from sites that
> > > explicitly declare they don't want to be crawled. Robots.txt is to
> be
> > > followed.
> > >
> > >> On Jun 1, 2017, at 5:31 PM, David Choi 
> > >>> wrote:
> > >>
> > >> Hello,
> > >>
> > >> I was wondering if anyone could guide me on how to crawl the web
> and
> > >> ignore the robots.txt since I can not index some big sites. Or if
> > >>> someone
> > >> could point how to get around it. I read somewhere about a
> > >> protocol.plugin.check.robots
> > >> but that was for nutch.
> > >>
> > >> The way I index is
> > >> bin/post -c gettingstarted https://en.wikipedia.org/
> > >>
> > >> but I can't index the site I'm guessing because of the robots.txt.
> > >> I can index with
> > >> bin/post -c gettingstarted http://lucene.apache.org/solr
> > >>
> > >> which I am guessing allows it. I was also wondering how to find
> the
> > >>> name
> > > of
> > >> the crawler bin/post uses.
> > >
> > >>>
> > >>>
> >
> >
>


Re: Performance warning: Overlapping onDeskSearchers=2 solr

2017-05-17 Thread Mike Drob
You're committing too frequently, so you have new searchers getting queued
up before the previous ones have been processed.

You have several options on how to deal with this. Can increase commit
interval, add hardware, or reduce query warming.

I don't know if uncommenting that section will help because I don't know
what your current settings are. Or if you are using manual commits.

Mike

On Wed, May 17, 2017, 4:58 AM Srinivas Kashyap 
wrote:

> Hi All,
>
> We are using Solr 5.2.1 version and are currently experiencing below
> Warning in Solr Logging Console:
>
> Performance warning: Overlapping onDeskSearchers=2
>
> Also we encounter,
>
> org.apache.solr.common.SolrException: Error opening new searcher. exceeded
> limit of maxWarmingSearchers=2,​ try again later.
>
>
> The reason being, we are doing mass update on our application and solr
> experiencing the higher loads at times. Data is being indexed using DIH(sql
> queries).
>
> In solrconfig.xml below is the code.
>
> 
>
> Should we be uncommenting the above lines and try to avoid this error?
> Please help me.
>
> Thanks and Regards,
> Srinivas Kashyap
>
> 
>
> DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are
> confidential. If you are not the intended recipient, please notify the
> sender immediately by replying to the e-mail, and then delete it without
> making copies or using it in any way. No representation is made that this
> email or any attachments are free of viruses. Virus scanning is recommended
> and is the responsibility of the recipient.
>


Re: SOLR as nosql database store

2017-05-10 Thread Mike Drob
> The searching install will be able to rebuild itself from the data
storage install when that
is required.

Is this a use case for CDCR?

Mike

On Tue, May 9, 2017 at 6:39 AM, Shawn Heisey  wrote:

> On 5/9/2017 12:58 AM, Bharath Kumar wrote:
> > Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas,
> will that not serve as backup when something goes wrong? Also we use latest
> solr 6 and from the documentation of solr, the indexing performance has
> been good. The reason is that we are using MySQL as the primary data store
> and the performance might not be optimal if we write data at a very rapid
> rate. Already we index almost half the fields that are in MySQL in solr.
>
> A replica is protection against data loss in the event of hardware
> failure, but there are classes of problems that it cannot protect against.
>
> Although Solr (Lucene) does try *really* hard to never lose data that it
> hasn't been asked to delete, it is not designed to be a database.  It's
> a search engine.  Solr doesn't offer the same kinds of guarantees about
> the data it contains that software like MySQL does.
>
> I personally don't recommend trying to use Solr as a primary data store,
> but if that's what you really want to do, then I would suggest that you
> have two complete Solr installs, with multiple replicas on both.  One of
> them will be used for searching and have a configuration you're already
> familiar with, the other will be purely for data storage -- only certain
> fields like the uniqueKey will be indexed, but every other field will be
> stored only.
>
> Running with two separate Solr installs will allow you to optimize one
> for searching and the other for data storage.  The searching install
> will be able to rebuild itself from the data storage install when that
> is required.  If better performance is needed for the rebuild, you have
> the option of writing a multi-threaded or multi-process program that
> reads from one and writes to the other.
>
> Thanks,
> Shawn
>
>


Re: Both main and replica are trying to access solr_gc.log.0.current file

2017-04-29 Thread Mike Drob
It might depend some on how you are starting Solr (I am less familiar with
Windows) but you will need to give each instead a separate log4j.properties
file and configure the log location in there.

Also check out the Solr Ref Guide section on Configuring Logging,
subsection Permanent Logging Settings.

https://cwiki.apache.org/confluence/display/solr/Configuring+Logging

Mike

On Sat, Apr 29, 2017, 12:24 PM Zheng Lin Edwin Yeo <edwinye...@gmail.com>
wrote:

> Yes, both Solr instances are running in the same hardware.
>
> I believe they are pointing to the same log directories/config too.
>
> How do we point them to different log directories/config?
>
> Regards,
> Edwin
>
>
> On 30 April 2017 at 00:36, Mike Drob <md...@apache.org> wrote:
>
> > Are you running both Solr instances in the same hardware and pointing
> them
> > at the same log directories/config?
> >
> > On Sat, Apr 29, 2017, 2:56 AM Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I'm using Solr 6.4.2 on SolrCloud, and I'm running 2 replica of Solr.
> > >
> > > When I start the replica, I will encounter this error message. It is
> > > probably due to the Solr log, as both the main and the replica are
> trying
> > > to access the same solr_gc.log.0.current file.
> > >
> > > Is there anyway to prevent this?
> > >
> > > Besides this error message, the rest of the Solr for both main and
> > replica
> > > are running normally.
> > >
> > > Exception in thread "main" java.nio.file.FileSystemException:
> > > C:\edwin\solr\server\logs\solr_gc.log.0.current ->
> > > C:\edwin\solr\server\logs\archived\solr_gc.log.0.current: The process
> > >  cannot access the file because it is being used by another process.
> > >
> > > at
> > > sun.nio.fs.WindowsException.translateToIOException(WindowsException.j
> > > ava:86)
> > > at
> > > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.jav
> > > a:97)
> > > at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
> > > at
> > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.j
> > > ava:287)
> > > at java.nio.file.Files.move(Files.java:1395)
> > > at
> > > org.apache.solr.util.SolrCLI$UtilsTool.archiveGcLogs(SolrCLI.java:357
> > > 9)
> > > at
> > > org.apache.solr.util.SolrCLI$UtilsTool.runTool(SolrCLI.java:3548)
> > > at org.apache.solr.util.SolrCLI.main(SolrCLI.java:250)
> > > "Failed archiving old GC logs"
> > > Exception in thread "main" java.nio.file.FileSystemException:
> > > C:\edwin\solr\server\logs\solr-8983-console.log ->
> > > C:\edwin\solr\server\logs\archived\solr-8983-console.log: The process
> > >  cannot access the file because it is being used by another process.
> > >
> > > at
> > > sun.nio.fs.WindowsException.translateToIOException(WindowsException.j
> > > ava:86)
> > > at
> > > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.jav
> > > a:97)
> > > at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
> > > at
> > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.j
> > > ava:287)
> > > at java.nio.file.Files.move(Files.java:1395)
> > > at
> > > org.apache.solr.util.SolrCLI$UtilsTool.archiveConsoleLogs(SolrCLI.jav
> > > a:3608)
> > > at
> > > org.apache.solr.util.SolrCLI$UtilsTool.runTool(SolrCLI.java:3551)
> > > at org.apache.solr.util.SolrCLI.main(SolrCLI.java:250)
> > > "Failed archiving old console logs"
> > > Exception in thread "main" java.nio.file.FileSystemException:
> > > C:\edwin\solr\server\logs\solr.log -> C:\edwin\solr\server\logs\
> > solr.log.1:
> > > The process cannot access the file because i
> > > t is being used by another process.
> > >
> > > at
> > > sun.nio.fs.WindowsException.translateToIOException(WindowsException.j
> > > ava:86)
> > > at
> > > sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.jav
> > > a:97)
> > > at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
> > > at
> > > sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.j
> > > ava:287)
> > > at java.nio.file.Files.move(Files.java:1395)
> > > at
> > > org.apache.solr.util.SolrCLI$UtilsTool.rotateSolrLogs(SolrCLI.java:36
> > > 51)
> > > at
> > > org.apache.solr.util.SolrCLI$UtilsTool.runTool(SolrCLI.java:3545)
> > > at org.apache.solr.util.SolrCLI.main(SolrCLI.java:250)
> > > "Failed rotating old Solr logs"
> > > Waiting up to 30 to see Solr running on port 8984
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> >
>


Re: Both main and replica are trying to access solr_gc.log.0.current file

2017-04-29 Thread Mike Drob
Are you running both Solr instances in the same hardware and pointing them
at the same log directories/config?

On Sat, Apr 29, 2017, 2:56 AM Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I'm using Solr 6.4.2 on SolrCloud, and I'm running 2 replica of Solr.
>
> When I start the replica, I will encounter this error message. It is
> probably due to the Solr log, as both the main and the replica are trying
> to access the same solr_gc.log.0.current file.
>
> Is there anyway to prevent this?
>
> Besides this error message, the rest of the Solr for both main and replica
> are running normally.
>
> Exception in thread "main" java.nio.file.FileSystemException:
> C:\edwin\solr\server\logs\solr_gc.log.0.current ->
> C:\edwin\solr\server\logs\archived\solr_gc.log.0.current: The process
>  cannot access the file because it is being used by another process.
>
> at
> sun.nio.fs.WindowsException.translateToIOException(WindowsException.j
> ava:86)
> at
> sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.jav
> a:97)
> at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
> at
> sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.j
> ava:287)
> at java.nio.file.Files.move(Files.java:1395)
> at
> org.apache.solr.util.SolrCLI$UtilsTool.archiveGcLogs(SolrCLI.java:357
> 9)
> at
> org.apache.solr.util.SolrCLI$UtilsTool.runTool(SolrCLI.java:3548)
> at org.apache.solr.util.SolrCLI.main(SolrCLI.java:250)
> "Failed archiving old GC logs"
> Exception in thread "main" java.nio.file.FileSystemException:
> C:\edwin\solr\server\logs\solr-8983-console.log ->
> C:\edwin\solr\server\logs\archived\solr-8983-console.log: The process
>  cannot access the file because it is being used by another process.
>
> at
> sun.nio.fs.WindowsException.translateToIOException(WindowsException.j
> ava:86)
> at
> sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.jav
> a:97)
> at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
> at
> sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.j
> ava:287)
> at java.nio.file.Files.move(Files.java:1395)
> at
> org.apache.solr.util.SolrCLI$UtilsTool.archiveConsoleLogs(SolrCLI.jav
> a:3608)
> at
> org.apache.solr.util.SolrCLI$UtilsTool.runTool(SolrCLI.java:3551)
> at org.apache.solr.util.SolrCLI.main(SolrCLI.java:250)
> "Failed archiving old console logs"
> Exception in thread "main" java.nio.file.FileSystemException:
> C:\edwin\solr\server\logs\solr.log -> C:\edwin\solr\server\logs\solr.log.1:
> The process cannot access the file because i
> t is being used by another process.
>
> at
> sun.nio.fs.WindowsException.translateToIOException(WindowsException.j
> ava:86)
> at
> sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.jav
> a:97)
> at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)
> at
> sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.j
> ava:287)
> at java.nio.file.Files.move(Files.java:1395)
> at
> org.apache.solr.util.SolrCLI$UtilsTool.rotateSolrLogs(SolrCLI.java:36
> 51)
> at
> org.apache.solr.util.SolrCLI$UtilsTool.runTool(SolrCLI.java:3545)
> at org.apache.solr.util.SolrCLI.main(SolrCLI.java:250)
> "Failed rotating old Solr logs"
> Waiting up to 30 to see Solr running on port 8984
>
>
> Regards,
> Edwin
>


Re: Too many Soft commits and opening searchers realtime

2015-07-07 Thread Mike Drob
Are the clients that are posting updates requesting commits?

On Tue, Jul 7, 2015 at 4:29 PM, Summer Shire shiresum...@gmail.com wrote:

 HI All,

 Can someone help me understand the following behavior.
 I have the following maxTimes on hard and soft commits

 yet I see a lot of Opening Searchers in the log
 org.apache.solr.search.SolrIndexSearcher- Opening Searcher@1656a258[main]
 realtime
 also I see a soft commit happening almost every 30 secs
 org.apache.solr.update.UpdateHandler - start
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 autoCommit
 maxTime48/maxTime
 openSearcherfalse/openSearcher
 /autoCommit

 autoSoftCommit
 maxTime18/maxTime
 /autoSoftCommit
 I tried disabling softCommit by setting maxTime to -1.
 On startup solrCore recognized it and logged Soft AutoCommit: disabled
 but I could still see softCommit=true
 org.apache.solr.update.UpdateHandler - start
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 autoSoftCommit
 maxTime-1/maxTime
 /autoSoftCommit

 Thanks,
 Summer


Pretty Print segments_N

2015-06-12 Thread Mike Drob
I'm doing some debugging work on a solr core, and would find it useful to
be able to pretty print the contents of the segments_N file in the index.
Is there already good functionality for this, or will I need to write up my
own utility using SegmentInfos?

Thanks,
Mike


Re: indexing java byte code in classes / jars

2015-05-08 Thread Mike Drob
What do the various Java IDEs use for indexing classes for
field/type/variable/method usage search? I imagine it's got to be bytecode.

On Fri, May 8, 2015 at 2:40 PM, Tomasz Borek tomasz.bo...@gmail.com wrote:

 Out of curiosity: why bytecode?

 pozdrawiam,
 LAFK

 2015-05-08 21:31 GMT+02:00 Mark javam...@gmail.com:

  I looking to use Solr search over the byte code in Classes and Jars.
 
  Does anyone know or have experience of Analyzers, Tokenizers, and Token
  Filters for such a task?
 
  Regards
 
  Mark
 



Re: ApacheCon 2015 at Austin, TX

2015-04-12 Thread Mike Drob
ApacheCon is starting tomorrow, so seeing if I pulling up this thread
yields any new replies this time. I'm hanging out in Austin, looking
forward to some good conversations and sessions!

On Wed, Feb 18, 2015 at 9:14 PM, CP Mishra mishr...@gmail.com wrote:

 Dmitry, that would be great.

 CP

 On Thu, Feb 12, 2015 at 5:35 AM, Dmitry Kan solrexp...@gmail.com wrote:

  Hi,
 
  Looks like I'll be there. So if you want to discuss luke / lucene / solr,
  will be happy to de-virtualize.
 
  Dmitry
 
  On Mon, Jan 12, 2015 at 6:32 PM, CP Mishra mishr...@gmail.com wrote:
 
   Hi,
  
   I am planning to attend ApacheCon 2015 at Austin, TX (Apr 13-16th) and
   wondering if there will be lucene/solr sessions in it.
  
   Anyone else planning to attend?
  
   Thanks,
   CP
  
 
 
 
  --
  Dmitry Kan
  Luke Toolbox: http://github.com/DmitryKey/luke
  Blog: http://dmitrykan.blogspot.com
  Twitter: http://twitter.com/dmitrykan
  SemanticAnalyzer: www.semanticanalyzer.info
 



Re: Which one is it cs or cz for Czech language?

2015-03-17 Thread Mike Drob
Probably a historical artifact.

cz is the country code for the Czech Republic, cs is the language code for
Czech. Once, cs was also the country code for Czechosolvakia, leading some
folks to accidentally conflate the two.

On Tue, Mar 17, 2015 at 12:35 PM, Eduard Moraru enygma2...@gmail.com
wrote:

 Hi,

 First of all, a bit of a disclaimer: I am not a Czech language speaker, at
 all.

 We are using Solr's dynamic fields in our project (XWiki), and we have
 recently noticed a problem [1] with the Czech language.

 Basically, our mapping says something like this:

 dynamicField name=*_cz type=text_cz indexed=true stored=true
 multiValued=true /

 ...but at runtime, we ask for the language code cs (which is the ISO
 language code for Czech [2]) and it obviously fails (due to the mapping).

 Now, we can easily fix this on our end by fixing the mapping to
 name=*_cs,
 but what we are really wondering now is why does Lucene/Solr use cz
 (country code) instead of cs (language code) in both its text_cz field
 and its stopwords_cz.txt file?

 Is that a mistake on the Solr/Lucene side? Is it some kind of convention?
 Is it going to be fixed?

 Thanks,
 Eduard

 --
 [1] http://jira.xwiki.org/browse/XWIKI-11897
 [2] http://en.wikipedia.org/wiki/Czech_language



Re: Checkout the source Code to the Release Version of Solr?

2015-02-17 Thread Mike Drob
The SVN source is under tags, not branches.

http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_3/

On Tue, Feb 17, 2015 at 4:39 PM, O. Olson olson_...@yahoo.it wrote:

 Thank you Hrishikesh. Funny how GitHub is not mentioned  on
 http://lucene.apache.org/solr/resources.html

 I think common-build.xml is what I was looking for. Thank you



 Hrishikesh Gadre-3 wrote
  Also the version number is encoded (at least) in the build file
 
 
 https://github.com/apache/lucene-solr/blob/817303840fce547a1557e330e93e5a8ac0618f34/lucene/common-build.xml#L32
 
  Hope this helps.
 
  Thanks
  Hrishikesh


 Hrishikesh Gadre-3 wrote
  Hi,
 
  You can get the released code base here
 
  https://github.com/apache/lucene-solr/releases
 
  Thanks
  Hrishikesh





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Checkout-the-source-Code-to-the-Release-Version-of-Solr-tp4187041p4187048.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to exclude selected filter (facet) from search result?

2015-02-02 Thread Mike Drob
Umang,

I believe this mailing list strips images. You might have better luck
uploading your image to a 3rd party hosting site and providing a link.

Thanks,
Mike

On Mon, Feb 2, 2015 at 12:35 PM, Umang Agrawal umang.i...@gmail.com wrote:

 Hi

 Could you please suggest how to exclude selected filter from solr search
 result.

 For example in below screenshot, I have selected filter camera but still
 camera (1) is returned in search response. How can I request solr to remove
 selected filter from search result?

 Thanks in advance.


 [image: Inline image 1]


 --
 Thanx  Regards
 Umang Agrawal



Re: Connection Reset Errors with Solr 4.4

2015-01-23 Thread Mike Drob
I'm not sure what a reasonable workaround would be. Perhaps somebody else
can brainstorm and make a suggestion, sorry.

On Tue, Jan 20, 2015 at 12:56 PM, Nishanth S nishanth.2...@gmail.com
wrote:

 Thank you Mike.Sure enough,we are running into the same issue you
 mentoined.Is there a quick fix for this other than the patch.I do not see
 the tlogs getting replayed at all.It is doing a full index recovery from
 the leader and our index size is around 200G.Would lowering the autocommit
 settings help(where the replica would go for a tlog replay as the tlogs I
 see are not huge).

 Thanks,
 Nishanth

 On Tue, Jan 20, 2015 at 10:46 AM, Mike Drob md...@apache.org wrote:

  Are we sure this isn't SOLR-6931?
 
  On Tue, Jan 20, 2015 at 11:39 AM, Nishanth S nishanth.2...@gmail.com
  wrote:
 
   Hello All,
  
   We are running solr cloud 4.4 with 30 shards and 3 replicas with real
  time
   indexing on rhel 6.5.The indexing rate is 3K Tps now.We are running
 into
  an
   issue with replicas going into recovery mode  due to connection reset
   errors.Soft commit time is 2 min and auto commit is set as 5 minutes.I
  have
   seen that replicas do a full index recovery which takes a long
   time(days).Below is the error trace that  I see.I would really
 appreciate
   any help in this case.
  
   g.apache.solr.client.solrj.SolrServerException: IOException occured
 when
   talking to server at: http://xxx:8083/solr/log_pn_shard20_replica2
   at
  
  
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:435)
   at
  
  
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
   at
  
  
 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
   at
  
  
 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at
   java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at
  
  
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at
  
  
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
   Caused by: java.net.SocketException: Connection reset
   at java.net.SocketInputStream.read(SocketInputStream.java:196)
   at java.net.SocketInputStream.read(SocketInputStream.java:122)
   at
  
  
 
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
   at
  
  
 
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
   at
  
  
 
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
   at
  
  
 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)
   at
  
  
 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
   at
  
  
 
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
   at
  
  
 
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
   at
  
  
 
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
   at
  
  
 
 org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
   at
  
  
 
 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
   at
  
  
 
 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
   at
  
  
 
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717)
   at
  
  
 
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522)
   at
  
  
 
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
   at
  
  
 
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
   at
  
  
 
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
   at
  
  
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
   ... 9 more
  
  
   Thanks,
   Nishanth
  
 



Re: Connection Reset Errors with Solr 4.4

2015-01-20 Thread Mike Drob
Are we sure this isn't SOLR-6931?

On Tue, Jan 20, 2015 at 11:39 AM, Nishanth S nishanth.2...@gmail.com
wrote:

 Hello All,

 We are running solr cloud 4.4 with 30 shards and 3 replicas with real time
 indexing on rhel 6.5.The indexing rate is 3K Tps now.We are running into an
 issue with replicas going into recovery mode  due to connection reset
 errors.Soft commit time is 2 min and auto commit is set as 5 minutes.I have
 seen that replicas do a full index recovery which takes a long
 time(days).Below is the error trace that  I see.I would really appreciate
 any help in this case.

 g.apache.solr.client.solrj.SolrServerException: IOException occured when
 talking to server at: http://xxx:8083/solr/log_pn_shard20_replica2
 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:435)
 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at

 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
 at

 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:196)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at

 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
 at

 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
 at

 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
 at

 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)
 at

 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
 at

 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
 at

 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
 at

 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
 at

 org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
 at

 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
 at

 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
 at

 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717)
 at

 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522)
 at

 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
 at

 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
 at

 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
 ... 9 more


 Thanks,
 Nishanth