date:20190909

[
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926158#comment-16926158
]

Megan Carey edited comment on SOLR-13101 at 9/9/19 11:10 PM:
-

h2. SolrCloud + Blobstore

I've attached a PR containing all of the existing blobstore code to this Jira
([https://github.com/apache/lucene-solr/pull/864]). The following description
can be found in that PR description, and in my branch under
lucene-solr/solr/README-BLOB.md. I'm posting it here as well for extra
visibility.
h3. Overview

This repo introduces a new framework which allows SolrCloud to integrate with
an external (typically cloud-based) blobstore. Instead of maintaining a copy of
the index on each Solr host, replicating updates to peers, and using a
transaction log to maintain consistent ordered updates, Solr hosts will push
and pull cores to/from this external store.

TL;DR: For now, SolrCloud can be configured to use blobstore at a collection
level. Collections backed by blobstore use a new SHARED replica type. When a
Solr node makes an update request to a shared shard, it indexes locally and
then pushes the change through to a shared blobstore. Zookeeper manages index
versioning and provides a source of truth in the case of concurrent writes.
Solr nodes in a cluster will no longer use peer-to-peer replication, and
instead will pull updates directly from the shared blobstore.

Please note that this project is a work in progress, and is by no means
production-ready. This code is being published early get feedback, which we
will incorporate in future work.

In order to modularize these changes and maintain existing functionality, most
of the blobstore-related code is isolated to the
_solr/core/src/java/org/apache/solr/store/blob directory_. However, there some
key integration touchpoints in _HttpSolrCall#init_,
_DistributedZkUpdateProcessor_, and _CoreContainer#load_. These classes all
have special handling for blobstore-based shards.
h3. Pulling from Blobstore

Core pulls are, for the most part, asynchronous. When a replica is queried, it
enqueues a pull from blobstore but doesn’t wait for the pull to complete before
it executes the query, unless the replica is missing a copy of that core
altogether. If your operation requires that local cores are in-sync with
blobstore, use the method _BlobStoreUtils#syncLocalCoreWithSharedStore_.

A more in-depth walkthrough of the pull code:
* _BlobCoreSyncer_: manages threads that sync between local and blob store, so
that if a pull is in progress, we do not create duplicate work.
* Calls into _CorePullTracker_: creates _PullCoreInfo_ object containing data
about the core to be pulled and adds to a deduplicated list.
* This queue of pull objects is polled by the _CorePullerFeeder_, which uses
threads from its dedicated thread pool to execute CorePullTasks.
* _CorePullTask_: checks if a pull is already underway for this core; if not,
executes a pull from blob store. Resolves differences between blob’s version of
the core and local version, and stores the updated core

h3. Pushing to Blobstore

This happens synchronously. On every local commit, we push to blobstore and
only ack that the update was successful when it is committed both locally and
in the shared store.

A more in-depth walkthrough of the push code:
* _DistributedZkUpdateProcessor_: once a commit is complete for a _SHARED_
replica (_onFinish_), we _writeToShareStore_.
* This calls into _CoreUpdateTracker_, which creates a _PushPullData_ object
containing data about the collection, core, and most recently pulled version of
the core on this replica.
* _CorePusher_: resolves the differences between blob’s version of the core
and local version, and pushes the updated version to blob store

h3. Resolving Local and Blobstore

The _SharedStoreResolutionUtil_ handles resolving diffs between the Solr node’s
local copy of a core and the copy in blobstore. It does so by pulling the
metadata for the core from blobstore (_BlobCoreMetadata_), comparing against
the local metadata (_ServerSideMetadata_), and creating a list of segments to
push or pull.
h3. Version Management

Only the leader node can push updates to blobstore. Because a new leader can be
elected at any time, there is still a possibility for race conditions on writes
to blobstore. In order to maintain a consistent global view of the latest
version of a core, we keep version data in Zookeeper.

Zookeeper stores this version data as a random string called _metadataSuffix_.
When a SolrCloud node makes an update request, it first pushes the files to
blobstore and then makes a conditional update to the metadataSuffix variable.
If Zookeeper rejects the conditional update, the update request fails, and the
failure is propagated back to the client.

This communication with Zookeeper is coordinated in the

[jira] [Comment Edited] (SOLR-13101) Shared storage support in SolrCloud

[
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926158#comment-16926158
]

Megan Carey edited comment on SOLR-13101 at 9/9/19 10:38 PM:
-

h2. SolrCloud + Blobstore

I've attached a PR containing all of the existing blobstore code to this Jira
([here|[https://github.com/apache/lucene-solr/pull/864]]). The following
description can be found in that PR under lucene-solr/solr/README-BLOB.md. I'm
posting it here as well for extra visibility.
h3. Overview

Please note that this project is a work in progress, and is by no means
production-ready. This code is being published early get feedback, which we
will incorporate in future work.

h3. Pushing to Blobstore

This happens synchronously. On every local commit, we push to blobstore and
only ack that the update was successful when it is committed both locally and
in the shared store.

h3. Resolving Local and Blobstore

This communication with Zookeeper is coordinated in the
_SharedShardMetadataController_. The

[jira] [Comment Edited] (SOLR-13101) Shared storage support in SolrCloud

[
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926158#comment-16926158
]

Megan Carey edited comment on SOLR-13101 at 9/9/19 10:38 PM:
-

h2. SolrCloud + Blobstore

I've attached a PR containing all of the existing blobstore code to this Jira
([https://github.com/apache/lucene-solr/pull/864]). The following description
can be found in that PR under lucene-solr/solr/README-BLOB.md. I'm posting it
here as well for extra visibility.
h3. Overview

Please note that this project is a work in progress, and is by no means
production-ready. This code is being published early get feedback, which we
will incorporate in future work.

h3. Pushing to Blobstore

This happens synchronously. On every local commit, we push to blobstore and
only ack that the update was successful when it is committed both locally and
in the shared store.

h3. Resolving Local and Blobstore

This communication with Zookeeper is coordinated in the
_SharedShardMetadataController_. The

[jira] [Comment Edited] (SOLR-13101) Shared storage support in SolrCloud

[
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926158#comment-16926158
]

Megan Carey edited comment on SOLR-13101 at 9/9/19 10:38 PM:
-

h2. SolrCloud + Blobstore

Please note that this project is a work in progress, and is by no means
production-ready. This code is being published early get feedback, which we
will incorporate in future work.

h3. Pushing to Blobstore

This happens synchronously. On every local commit, we push to blobstore and
only ack that the update was successful when it is committed both locally and
in the shared store.

h3. Resolving Local and Blobstore

This communication with Zookeeper is coordinated in the
_SharedShardMetadataController_. The

[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

[
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926158#comment-16926158
]

Megan Carey commented on SOLR-13101:

h2. SolrCloud + Blobstore
h3. Overview

Please note that this project is a work in progress, and is by no means
production-ready. This code is being published early get feedback, which we
will incorporate in future work.

A more in-depth walkthrough of the pull code:
* _BlobCoreSyncer_: manages threads that sync between local and blob store, so
that if a pull is in progress, we do not create duplicate work.
* Calls into _CorePullTracker_: creates _PullCoreInfo_ object containing data
about the core to be pulled and adds to a deduplicated list.
* This queue of pull objects is polled by the _CorePullerFeeder_, which uses
threads from its dedicated thread pool to execute _CorePullTask_s.
* _CorePullTask_: checks if a pull is already underway for this core; if not,
executes a pull from blob store. Resolves differences between blob’s version of
the core and local version, and stores the updated core

h3. Pushing to Blobstore

This happens synchronously. On every local commit, we push to blobstore and
only ack that the update was successful when it is committed both locally and
in the shared store.

h3. Resolving Local and Blobstore

This communication with Zookeeper is coordinated in the
_SharedShardMetadataController_. The SharedShardMetadataController belongs to
the _Overseer_ (i.e. the leader replica).
h3. Try it yourself

If you want to try this out locally, you can start up SolrCloud with the given
blobstore code. The code will default to using the local blobstore client, with
"/tmp/BlobStoreLocal" as the blobstore directory (see

[jira] [Resolved] (SOLR-13750) [CVE-2019-12401] XML Bomb in Apache Solr versions prior to 5.0.0



 [ 
https://issues.apache.org/jira/browse/SOLR-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe resolved SOLR-13750.
--
Resolution: Information Provided

> [CVE-2019-12401] XML Bomb in Apache Solr versions prior to 5.0.0
> 
>
> Key: SOLR-13750
> URL: https://issues.apache.org/jira/browse/SOLR-13750
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 1.3, 1.4, 1.4.1, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.6.1, 
> 3.6.2, 4.0, 4.1, 4.2, 4.2.1, 4.3, 4.3.1, 4.4, 4.5, 4.5.1, 4.6, 4.6.1, 4.7, 
> 4.7.1, 4.7.2, 4.8, 4.8.1, 4.9, 4.9.1, 4.10, 4.10.1, 4.10.2, 4.10.3, 4.10.4
>Reporter: Tomás Fernández Löbbe
>Priority: Major
> Fix For: 5.0
>
>
> Severity: Medium
> Vendor: The Apache Software Foundation
> Versions Affected:
>  1.3.0 to 1.4.1
>  3.1.0 to 3.6.2
>  4.0.0 to 4.10.4
> Description:
>  Solr versions prior to 5.0.0 are vulnerable to an XML resource consumption 
> attack (a.k.a. Lol Bomb) via it’s update handler.
>  By leveraging XML DOCTYPE and ENTITY type elements, the attacker can create 
> a pattern that will expand when the server parses the XML causing OOMs.
> Mitigation:
>  * Upgrade to Apache Solr 5.0 or later.
>  * Ensure your network settings are configured so that only trusted traffic 
> is allowed to post documents to the running Solr instances.
> Credit:
>  Matei "Mal" Badanoiu



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13750) [CVE-2019-12401] XML Bomb in Apache Solr versions prior to 5.0.0



 [ 
https://issues.apache.org/jira/browse/SOLR-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe updated SOLR-13750:
-
Security: Public  (was: Private (Security Issue))

> [CVE-2019-12401] XML Bomb in Apache Solr versions prior to 5.0.0
> 
>
> Key: SOLR-13750
> URL: https://issues.apache.org/jira/browse/SOLR-13750
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 1.3, 1.4, 1.4.1, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.6.1, 
> 3.6.2, 4.0, 4.1, 4.2, 4.2.1, 4.3, 4.3.1, 4.4, 4.5, 4.5.1, 4.6, 4.6.1, 4.7, 
> 4.7.1, 4.7.2, 4.8, 4.8.1, 4.9, 4.9.1, 4.10, 4.10.1, 4.10.2, 4.10.3, 4.10.4
>Reporter: Tomás Fernández Löbbe
>Priority: Major
> Fix For: 5.0
>
>
> Severity: Medium
> Vendor: The Apache Software Foundation
> Versions Affected:
>  1.3.0 to 1.4.1
>  3.1.0 to 3.6.2
>  4.0.0 to 4.10.4
> Description:
>  Solr versions prior to 5.0.0 are vulnerable to an XML resource consumption 
> attack (a.k.a. Lol Bomb) via it’s update handler.
>  By leveraging XML DOCTYPE and ENTITY type elements, the attacker can create 
> a pattern that will expand when the server parses the XML causing OOMs.
> Mitigation:
>  * Upgrade to Apache Solr 5.0 or later.
>  * Ensure your network settings are configured so that only trusted traffic 
> is allowed to post documents to the running Solr instances.
> Credit:
>  Matei "Mal" Badanoiu



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[SECURITY] CVE-2019-12401: XML Bomb in Apache Solr versions prior to 5.0

2019-09-09 Thread Tomas Fernandez Lobbe

Severity: Medium

Vendor: The Apache Software Foundation

Versions Affected:
1.3.0 to 1.4.1
3.1.0 to 3.6.2
4.0.0 to 4.10.4

Description: Solr versions prior to 5.0.0 are vulnerable to an XML resource
consumption attack (a.k.a. Lol Bomb) via it’s update handler. By leveraging
XML DOCTYPE and ENTITY type elements, the attacker can create a pattern
that will expand when the server parses the XML causing OOMs

Mitigation:
* Upgrade to Apache Solr 5.0 or later.
* Ensure your network settings are configured so that only trusted traffic
is allowed to post documents to the running Solr instances.

Credit: Matei "Mal" Badanoiu

References:
[1] https://issues.apache.org/jira/browse/SOLR-13750
[2] https://wiki.apache.org/solr/SolrSecurity

[JENKINS] Lucene-Solr-NightlyTests-8.x - Build # 207 - Still Failing

2019-09-09 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-8.x/207/

No tests ran.

Build Log:
[...truncated 25 lines...]
ERROR: Failed to check out http://svn.apache.org/repos/asf/lucene/test-data
org.tmatesoft.svn.core.SVNException: svn: E175002: connection refused by the 
server
svn: E175002: OPTIONS request failed on '/repos/asf/lucene/test-data'
at 
org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:112)
at 
org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:96)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:765)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:352)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:340)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.performHttpRequest(DAVConnection.java:910)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities(DAVConnection.java:702)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnection.java:113)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAVRepository.java:1035)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getLatestRevision(DAVRepository.java:164)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgRepositoryAccess.getRevisionNumber(SvnNgRepositoryAccess.java:119)
at 
org.tmatesoft.svn.core.internal.wc2.SvnRepositoryAccess.getLocations(SvnRepositoryAccess.java:178)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgRepositoryAccess.createRepositoryFor(SvnNgRepositoryAccess.java:43)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgAbstractUpdate.checkout(SvnNgAbstractUpdate.java:831)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgCheckout.run(SvnNgCheckout.java:26)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgCheckout.run(SvnNgCheckout.java:11)
at 
org.tmatesoft.svn.core.internal.wc2.ng.SvnNgOperationRunner.run(SvnNgOperationRunner.java:20)
at 
org.tmatesoft.svn.core.internal.wc2.SvnOperationRunner.run(SvnOperationRunner.java:21)
at 
org.tmatesoft.svn.core.wc2.SvnOperationFactory.run(SvnOperationFactory.java:1239)
at org.tmatesoft.svn.core.wc2.SvnOperation.run(SvnOperation.java:294)
at 
hudson.scm.subversion.CheckoutUpdater$SubversionUpdateTask.perform(CheckoutUpdater.java:133)
at 
hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:168)
at 
hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:176)
at 
hudson.scm.subversion.UpdateUpdater$TaskImpl.perform(UpdateUpdater.java:134)
at 
hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:168)
at 
hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:1041)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:1017)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:990)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3086)
at hudson.remoting.UserRequest.perform(UserRequest.java:212)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:369)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at 
org.tmatesoft.svn.core.internal.util.SVNSocketConnection.run(SVNSocketConnection.java:57)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
... 4 more
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)

[GitHub] [lucene-solr] megancarey opened a new pull request #864: SOLR-13101 : Shared storage support in SolrCloud

megancarey opened a new pull request #864: SOLR-13101 : Shared storage support 
in SolrCloud
URL: https://github.com/apache/lucene-solr/pull/864
 
 
   
   
   
   # Description
   
   This PR is being opened to expose the code for integrating SolrCloud with a 
shared blobstore. For more details, see the attached JIRA or look at 
`lucene-solr/solr/README-BLOB.md`.
   
   # Solution
   
   N/A
   
   # Tests
   
   Find tests in the test directory 
(`solr/core/src/test/org/apache/solr/store/blob/`).
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I am authorized to contribute this code to the ASF and have removed 
any code I do not have a license to distribute.
   - [ ] I have developed this patch against the `master` branch.
   - [x] I have run `ant precommit` and the appropriate test suite.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #857: LUCENE-8968: Improve performance of WITHIN and DISJOINT queries for Shape queries

jpountz commented on a change in pull request #857: LUCENE-8968: Improve 
performance of WITHIN and DISJOINT queries for Shape queries
URL: https://github.com/apache/lucene-solr/pull/857#discussion_r322441239
 
 

 ##
 File path: lucene/sandbox/src/java/org/apache/lucene/document/ShapeQuery.java
 ##
 @@ -373,49 +235,265 @@ protected Scorer getIntersectsScorer(ShapeQuery query, 
LeafReader reader, Weight
 // by computing the set of documents that do NOT match the query
 final FixedBitSet result = new FixedBitSet(reader.maxDoc());
 result.set(0, reader.maxDoc());
-int[] cost = new int[]{reader.maxDoc()};
-values.intersect(getInverseIntersectVisitor(query, result, cost));
+final long[] cost = new long[]{reader.maxDoc()};
+values.intersect(getInverseDenseVisitor(query, result, cost));
 final DocIdSetIterator iterator = new BitSetIterator(result, cost[0]);
 return new ConstantScoreScorer(weight, boost, scoreMode, iterator);
   }
-
-  values.intersect(visitor);
-  DocIdSetIterator iterator = docIdSetBuilder.build().iterator();
+  final DocIdSetBuilder docIdSetBuilder = new 
DocIdSetBuilder(reader.maxDoc(), values, query.getField());
+  values.intersect(getSparseVisitor(query, docIdSetBuilder));
+  final DocIdSetIterator iterator = docIdSetBuilder.build().iterator();
   return new ConstantScoreScorer(weight, boost, scoreMode, iterator);
 }
 
-/** returns a Scorer for all other (non INTERSECT) queries */
-protected Scorer getScorer(ShapeQuery query, Weight weight,
-   FixedBitSet intersect, FixedBitSet disjoint, 
final float boost, ScoreMode scoreMode) throws IOException {
-  values.intersect(visitor);
-  if (disjointVisitor != null) {
-values.intersect(disjointVisitor);
-  }
-  DocIdSetIterator iterator;
-  if (query.queryRelation == ShapeField.QueryRelation.DISJOINT) {
-disjoint.andNot(intersect);
-iterator = new BitSetIterator(disjoint, cost());
-  } else if (query.queryRelation == ShapeField.QueryRelation.WITHIN) {
-intersect.andNot(disjoint);
-iterator = new BitSetIterator(intersect, cost());
+/** Scorer used for WITHIN and DISJOINT **/
+private Scorer getDenseScorer(LeafReader reader, Weight weight, final 
float boost, ScoreMode scoreMode) throws IOException {
+  final FixedBitSet result = new FixedBitSet(reader.maxDoc());
+  final long[] cost;
+  if (values.getDocCount() == reader.maxDoc()) {
+// First we check if we have any hits so we are fast in the 
adversarial case where
+// the shape does not match any documents
+if (hasAnyHits(query, values) == false) {
+  // no hits so we can return
+  return new ConstantScoreScorer(weight, boost, scoreMode, 
DocIdSetIterator.empty());
+}
 
 Review comment:
   it'd be slightly better to handle this case in the scorer supplier to return 
a null scorer


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: IndexWriter Closed Exception on small concurrency

2019-09-09 Thread Aravind S (User Intent)

Hi,

I'm seeing on increasing parallel writes on embedded Solr running with HDFS
Lucene directory with doc size around 800Kb. I'm seeing exceptions of
IndexWriter closed with stack trace originating from
https://github.com/apache/lucene-solr/blob/a288710a64acdde6abc8ce96a0d3b3e18739ac32/solr/core/src/java/org/apache/solr/store/hdfs/HdfsDirectory.java#L118

Please let me know how do I solve this issue?

   - The concurrency of writes is around 10.
   - Merge policy set is TieredMergePolicy with sorting enabled on a
   numeric field with default merges per segment to be set as 10.

On Tue, Sep 10, 2019 at 1:34 AM Aravind S (User Intent) <
s.arav...@flipkart.com> wrote:

> Hi,
>
> I'm seeing on increasing parallel writes on embedded Solr running with
> HDFS Lucene directory with doc size around 800Kb. I'm seeing exceptions of
> IndexWriter closed with stack trace originating from
> https://github.com/apache/lucene-solr/blob/a288710a64acdde6abc8ce96a0d3b3e18739ac32/solr/core/src/java/org/apache/solr/store/hdfs/HdfsDirectory.java#L118
>
> Please let me know how do I solve this issue?
> The concurrency of writes is around 10.
>
> On Tue, Sep 10, 2019 at 1:34 AM Aravind S (User Intent) <
> s.arav...@flipkart.com> wrote:
>
>> Hi,
>>
>> I'm seeing on increasing parallel writes on embedded Solr running with
>> HDFS Lucene directory with doc size around 800Kb. I'm seeing exceptions of
>> IndexWriter closed with stack trace originating from
>> https://github.com/apache/lucene-solr/blob/a288710a64acdde6abc8ce96a0d3b3e18739ac32/solr/core/src/java/org/apache/solr/store/hdfs/HdfsDirectory.java#L118
>>
>> Please let me know how do I solve this issue?
>>
>> Regards,
>> Sravind S
>>
>>
>>
>>

-- 

*-*

*This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify the 
system manager. This message contains confidential information and is 
intended only for the individual named. If you are not the named addressee, 
you should not disseminate, distribute or copy this email. Please notify 
the sender immediately by email if you have received this email by mistake 
and delete this email from your system. If you are not the intended 
recipient, you are notified that disclosing, copying, distributing or 
taking any action in reliance on the contents of this information is 
strictly prohibited.*

*Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those 
of the organization. Any information on shares, debentures or similar 
instruments, recommended product pricing, valuations and the like are for 
information purposes only. It is not meant to be an instruction or 
recommendation, as the case may be, to buy or to sell securities, products, 
services nor an offer to buy or sell securities, products or services 
unless specifically stated to be so on behalf of the Flipkart group. 
Employees of the Flipkart group of companies are expressly required not to 
make defamatory statements and not to infringe or authorise any 
infringement of copyright or any other legal right by email communications. 
Any such communication is contrary to organizational policy and outside the 
scope of the employment of the individual concerned. The organization will 
not accept any liability in respect of such communication, and the employee 
responsible will be personally liable for any damages or other liability 
arising.*

*Our organization accepts no liability for the 
content of this email, or for the consequences of any actions taken on the 
basis of the information *provided,* unless that information is 
subsequently confirmed in writing. If you are not the intended recipient, 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited.*

_-_

[jira] [Commented] (LUCENE-8961) CheckIndex: pre-exorcise document id salvage

2019-09-09 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926078#comment-16926078
 ] 

Adrien Grand commented on LUCENE-8961:
--

Agreed it is awkward. When I said "on top of CheckIndex", I was rather thinking 
of running CheckIndex programmatically and then looking at the return value to 
understand what segments might need salvaging. A separate stand-alone tool 
sounds good to me too.

> CheckIndex: pre-exorcise document id salvage
> 
>
> Key: LUCENE-8961
> URL: https://issues.apache.org/jira/browse/LUCENE-8961
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Christine Poerschke
>Priority: Minor
> Attachments: LUCENE-8961.patch, LUCENE-8961.patch
>
>
> The 
> [CheckIndex|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.2.0/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java]
>  tool supports the exorcising of corrupt segments from an index.
> This ticket proposes to add an extra option which could first be used to 
> potentially salvage the document ids of the segment(s) about to be exorcised. 
> Re-ingestion for those documents could then be arranged so as to repair the 
> data damage caused by the exorcising.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: IndexWriter Closed Exception on small concurrency

2019-09-09 Thread Aravind S (User Intent)

Hi,

I'm seeing on increasing parallel writes on embedded Solr running with HDFS
Lucene directory with doc size around 800Kb. I'm seeing exceptions of
IndexWriter closed with stack trace originating from
https://github.com/apache/lucene-solr/blob/a288710a64acdde6abc8ce96a0d3b3e18739ac32/solr/core/src/java/org/apache/solr/store/hdfs/HdfsDirectory.java#L118

Please let me know how do I solve this issue?
The concurrency of writes is around 10.

On Tue, Sep 10, 2019 at 1:34 AM Aravind S (User Intent) <
s.arav...@flipkart.com> wrote:

> Hi,
>
> I'm seeing on increasing parallel writes on embedded Solr running with
> HDFS Lucene directory with doc size around 800Kb. I'm seeing exceptions of
> IndexWriter closed with stack trace originating from
> https://github.com/apache/lucene-solr/blob/a288710a64acdde6abc8ce96a0d3b3e18739ac32/solr/core/src/java/org/apache/solr/store/hdfs/HdfsDirectory.java#L118
>
> Please let me know how do I solve this issue?
>
> Regards,
> Sravind S
>
>
>
>

-- 

*-*

*This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify the 
system manager. This message contains confidential information and is 
intended only for the individual named. If you are not the named addressee, 
you should not disseminate, distribute or copy this email. Please notify 
the sender immediately by email if you have received this email by mistake 
and delete this email from your system. If you are not the intended 
recipient, you are notified that disclosing, copying, distributing or 
taking any action in reliance on the contents of this information is 
strictly prohibited.*

*Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those 
of the organization. Any information on shares, debentures or similar 
instruments, recommended product pricing, valuations and the like are for 
information purposes only. It is not meant to be an instruction or 
recommendation, as the case may be, to buy or to sell securities, products, 
services nor an offer to buy or sell securities, products or services 
unless specifically stated to be so on behalf of the Flipkart group. 
Employees of the Flipkart group of companies are expressly required not to 
make defamatory statements and not to infringe or authorise any 
infringement of copyright or any other legal right by email communications. 
Any such communication is contrary to organizational policy and outside the 
scope of the employment of the individual concerned. The organization will 
not accept any liability in respect of such communication, and the employee 
responsible will be personally liable for any damages or other liability 
arising.*

*Our organization accepts no liability for the 
content of this email, or for the consequences of any actions taken on the 
basis of the information *provided,* unless that information is 
subsequently confirmed in writing. If you are not the intended recipient, 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited.*

_-_

IndexWriter Closed Exception on small concurrency

2019-09-09 Thread Aravind S (User Intent)

Hi,

I'm seeing on increasing parallel writes on embedded Solr running with HDFS
Lucene directory with doc size around 800Kb. I'm seeing exceptions of
IndexWriter closed with stack trace originating from
https://github.com/apache/lucene-solr/blob/a288710a64acdde6abc8ce96a0d3b3e18739ac32/solr/core/src/java/org/apache/solr/store/hdfs/HdfsDirectory.java#L118

Please let me know how do I solve this issue?

Regards,
Sravind S

-- 



*-*


*This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify the 
system manager. This message contains confidential information and is 
intended only for the individual named. If you are not the named addressee, 
you should not disseminate, distribute or copy this email. Please notify 
the sender immediately by email if you have received this email by mistake 
and delete this email from your system. If you are not the intended 
recipient, you are notified that disclosing, copying, distributing or 
taking any action in reliance on the contents of this information is 
strictly prohibited.*

 

*Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those 
of the organization. Any information on shares, debentures or similar 
instruments, recommended product pricing, valuations and the like are for 
information purposes only. It is not meant to be an instruction or 
recommendation, as the case may be, to buy or to sell securities, products, 
services nor an offer to buy or sell securities, products or services 
unless specifically stated to be so on behalf of the Flipkart group. 
Employees of the Flipkart group of companies are expressly required not to 
make defamatory statements and not to infringe or authorise any 
infringement of copyright or any other legal right by email communications. 
Any such communication is contrary to organizational policy and outside the 
scope of the employment of the individual concerned. The organization will 
not accept any liability in respect of such communication, and the employee 
responsible will be personally liable for any damages or other liability 
arising.*

 

*Our organization accepts no liability for the 
content of this email, or for the consequences of any actions taken on the 
basis of the information *provided,* unless that information is 
subsequently confirmed in writing. If you are not the intended recipient, 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited.*


_-_

[jira] [Commented] (SOLR-13745) Test should close resources: AtomicUpdateProcessorFactoryTest

2019-09-09 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926002#comment-16926002
 ] 

David Smiley commented on SOLR-13745:
-

Aha; this ObjectReleaseTracker looks super easy to use.  Activated when 
assertions are enabled.  Cool; maybe I'll file an issue for it.

> Test should close resources: AtomicUpdateProcessorFactoryTest 
> --
>
> Key: SOLR-13745
> URL: https://issues.apache.org/jira/browse/SOLR-13745
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.3
>
>
> This tests hangs after the test runs because there are directory or request 
> resources (not sure yet) that are not closed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Request For Addition to Wiki

2019-09-09 Thread Jan Høydahl

I added you as "Individual user" under 
https://cwiki.apache.org/confluence/spaces/spacepermissions.action?key=SOLR
Please try now

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 9. sep. 2019 kl. 19:19 skrev Atri Sharma :
> 
> On Mon, Sep 9, 2019 at 10:44 PM Anshum Gupta  wrote:
>> 
>> Just to be clear, you were asking for access to Lucene confluence space, 
>> right?
> 
> Yep, thats correct.
> 
> I am able to login to Confluence with my ASF credentials, but not able
> to create pages under the Lucene Java space.
> 
> Atri
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

[jira] [Commented] (SOLR-13746) Apache jenkins needs JVM 11 upgraded to at least 11.0.3 (SSL bugs)

2019-09-09 Thread Hoss Man (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925947#comment-16925947
 ] 

Hoss Man commented on SOLR-13746:
-

bq. ... No idea, there is an issue / mail thread already at ASF about 
AdoptOpenJDK. ...
bq. ... I think we should get Infra involved, at a minimum to ask if we should 
be managing JDKs on a self-serve basis. ...

So i'm not really clear on where we stand now...

IIUC in order to upgrade past 11.0.1 (which is broken) we need to use 
(Adopt)OpenJDK because oracle hasn't made 11.0.(2|3|4) builds available? -- and 
 It sounds like is there an INFRA issue or mail archive thread somewhere about 
being able to use OpenJDK ... can someone post a link to that?  is that a 
discussion that's being had in public or in private? (even if it's private, can 
someone post a link to it so folks w/request karma can access it)

Is the infra conversation something happening "in the abstract" of "if/when 
OpenJDK builds can/should be used", or is it concretely about the need for 
specific projects to switch? ... ie:  Has the fact that 11.0.1 is broken and 
effectively unusable for a lot of Solr testing been mentioned in the context of 
that discussion?   Can/should we be filing an INFRA jira explicitly requesting 
upgraded JDKs so it's clear there is a demonstrable need? (does such an issue 
already exist already? can someone please link it here?)

Finally: is docker available on the jenkins build slaves, because worst case 
scenario we could tweak our apache jenkins jobs to run inside docker containers 
that always use the latest AdoptOpenJDK base images, ala...

https://github.com/hossman/solr-jenkins-docker-tester



bq. Should we also add this note to the JVM bugs page: 
https://cwiki.apache.org/confluence/display/lucene/JavaBugs#JavaBugs-OracleJava/SunJava/OpenJDKBugs

I thought someone already did this in response to an email thread about this 
general topci a ffew months ago -- but maybe not?

the list of known JVM SSL bugs is well documented in SOLR-12988 -- anyone who 
wants to take a stabe at summarizing that info in the wiki or release notes of 
Solr is welcome to take a stab at it (my focus has been on tests themselves and 
trying to figure out if there are any other SSL bugs we're overlooked ... 
something i'm now freaking out about more as i realized none of the apache 
jenkins jobs have actaully been testing SSL)

> Apache jenkins needs JVM 11 upgraded to at least 11.0.3 (SSL bugs)
> --
>
> Key: SOLR-13746
> URL: https://issues.apache.org/jira/browse/SOLR-13746
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
>
> I just realized that back in June, there was a misscommunication between 
> myself & Uwe (and a lack of double checking on my part!) regarding upgrading 
> the JVM versions on our jenkins machines...
>  * 
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/201906.mbox/%3calpine.DEB.2.11.1906181434350.23523@tray%3e]
>  * 
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/201906.mbox/%3C00b301d52918$d27b2f60$77718e20$@thetaphi.de%3E]
> ...Uwe only updated the JVMs on _his_ policeman jenkins machines - the JVM 
> used on the _*apache*_  jenkins nodes is still (as of 2019-09-06)  
> "11.0.1+13-LTS" ...
> [https://builds.apache.org/view/L/view/Lucene/job/Lucene-Solr-Tests-master/3689/consoleText]
> {noformat}
> ...
> [java-info] java version "11.0.1"
> [java-info] Java(TM) SE Runtime Environment (11.0.1+13-LTS, Oracle 
> Corporation)
> [java-info] Java HotSpot(TM) 64-Bit Server VM (11.0.1+13-LTS, Oracle 
> Corporation)
> ...
> {noformat}
> This means that even after the changes made in SOLR-12988 to re-enable SSL 
> testing on java11, all Apache jenkins 'master' builds, (including, AFAICT the 
> yetus / 'Patch Review' builds) are still SKIPping thousands of tests that use 
> SSL (either explicitly, or due to randomization) becauseof the logic in 
> SSLTestConfig to detects  bad JVM versions an prevent confusion/spurious 
> failures.
> We really need to get the jenkins nodes updated to openjdk 11.0.3 or 11.0.4 
> ASAP.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Request For Addition to Wiki

2019-09-09 Thread Atri Sharma

On Mon, Sep 9, 2019 at 10:44 PM Anshum Gupta  wrote:
>
> Just to be clear, you were asking for access to Lucene confluence space, 
> right?

Yep, thats correct.

I am able to login to Confluence with my ASF credentials, but not able
to create pages under the Lucene Java space.

Atri

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Request For Addition to Wiki

2019-09-09 Thread Anshum Gupta

Just to be clear, you were asking for access to Lucene confluence space,
right?

It's been a while since I did this, and with moinmoin gone, as per my
understanding the confluence stuff is controlled by the ASF LDAP so not
sure how to grant access. Hopefully someone else can help you with this.

Anshum

On Mon, Sep 9, 2019 at 8:34 AM Atri Sharma  wrote:

> Please add my login (atris) to Contributor list so that I can update the
> wiki.
>
> Regards,
>
> Atri
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Anshum Gupta

[jira] [Commented] (SOLR-13745) Test should close resources: AtomicUpdateProcessorFactoryTest

2019-09-09 Thread Hoss Man (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925923#comment-16925923
 ] 

Hoss Man commented on SOLR-13745:
-

bq.  ... It'd be nice if failing to close a SolrQueryRequest might be enforced 
in tests ...

I haven't dug into how/where exactly the ObjectTrracking logic helps enforce 
that we're closing things like SolrIndexSearcher, but in theory there isn't any 
reason it couldn't also enforce that we're closing (Local)SolrQueryRequest 
objects? ... i think?


> Test should close resources: AtomicUpdateProcessorFactoryTest 
> --
>
> Key: SOLR-13745
> URL: https://issues.apache.org/jira/browse/SOLR-13745
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.3
>
>
> This tests hangs after the test runs because there are directory or request 
> resources (not sure yet) that are not closed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8972) CharFilter version of ICUTransformFilter, to better support dictionary-based tokenization

2019-09-09 Thread Michael Gibney (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925841#comment-16925841
 ] 

Michael Gibney commented on LUCENE-8972:


For consideration, I believe this issue has already been tackled by [~cbeer] 
and [~mejackreed]; the resulting implementation can be found 
[here|https://github.com/sul-dlss/CJKFilterUtils/blob/master/src/main/java/edu/stanford/lucene/analysis/ICUTransformCharFilter.java].

> CharFilter version of ICUTransformFilter, to better support dictionary-based 
> tokenization
> -
>
> Key: LUCENE-8972
> URL: https://issues.apache.org/jira/browse/LUCENE-8972
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: master (9.0), 8.2
>Reporter: Michael Gibney
>Priority: Minor
>
> The ICU Transliteration API is currently exposed through Lucene only 
> post-tokinzer, via ICUTransformFilter. Some tokenizers (particularly 
> dictionary-based) may assume pre-normalized input (e.g., for Chinese 
> characters, there may be an assumption of traditional-only or simplified-only 
> input characters, at the level of either all input, or 
> per-dictionary-defined-token).
> The potential usefulness of a CharFilter that exposes the ICU Transliteration 
> API was suggested in a [thread on the Solr mailing 
> list|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201807.mbox/%3C4DAB7BA7-42A8-4009-8B49-60822B00DE7D%40wunderwood.org%3E],
>  and my hope is that this issue can facilitate more detailed discussion of 
> the proposed addition.
> A concrete example of mixed traditional/simplified characters that are 
> currently tokenized differently by the ICUTokenizer are:
>  * 红楼梦 (SSS)
>  * 紅樓夢 (TTT)
>  * 紅楼夢 (TST)
> The first two tokens (simplified-only and traditional-only, respectively) are 
> included in the [CJ dictionary that backs 
> ICUTokenizer|https://raw.githubusercontent.com/unicode-org/icu/release-62-1/icu4c/source/data/brkitr/dictionaries/cjdict.txt],
>  but the last (a mixture of traditional and simplified characters) is not, 
> and is not recognized as a token. Even _if_ we assume this to be an 
> intentional omission from the dictionary that results in behavior that could 
> be desirable for some use cases, there are surely some use cases that would 
> benefit from a more permissive dictionary-based tokenization strategy (such 
> as could be supported by pre-tokenizer transliteration).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8972) CharFilter version of ICUTransformFilter, to better support dictionary-based tokenization

2019-09-09 Thread Michael Gibney (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gibney updated LUCENE-8972:
---
Summary: CharFilter version of ICUTransformFilter, to better support 
dictionary-based tokenization  (was: CharFilter version ICUTransformFilter, to 
better support dictionary-based tokenization)

> CharFilter version of ICUTransformFilter, to better support dictionary-based 
> tokenization
> -
>
> Key: LUCENE-8972
> URL: https://issues.apache.org/jira/browse/LUCENE-8972
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: master (9.0), 8.2
>Reporter: Michael Gibney
>Priority: Minor
>
> The ICU Transliteration API is currently exposed through Lucene only 
> post-tokinzer, via ICUTransformFilter. Some tokenizers (particularly 
> dictionary-based) may assume pre-normalized input (e.g., for Chinese 
> characters, there may be an assumption of traditional-only or simplified-only 
> input characters, at the level of either all input, or 
> per-dictionary-defined-token).
> The potential usefulness of a CharFilter that exposes the ICU Transliteration 
> API was suggested in a [thread on the Solr mailing 
> list|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201807.mbox/%3C4DAB7BA7-42A8-4009-8B49-60822B00DE7D%40wunderwood.org%3E],
>  and my hope is that this issue can facilitate more detailed discussion of 
> the proposed addition.
> A concrete example of mixed traditional/simplified characters that are 
> currently tokenized differently by the ICUTokenizer are:
>  * 红楼梦 (SSS)
>  * 紅樓夢 (TTT)
>  * 紅楼夢 (TST)
> The first two tokens (simplified-only and traditional-only, respectively) are 
> included in the [CJ dictionary that backs 
> ICUTokenizer|https://raw.githubusercontent.com/unicode-org/icu/release-62-1/icu4c/source/data/brkitr/dictionaries/cjdict.txt],
>  but the last (a mixture of traditional and simplified characters) is not, 
> and is not recognized as a token. Even _if_ we assume this to be an 
> intentional omission from the dictionary that results in behavior that could 
> be desirable for some use cases, there are surely some use cases that would 
> benefit from a more permissive dictionary-based tokenization strategy (such 
> as could be supported by pre-tokenizer transliteration).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8972) CharFilter version ICUTransformFilter, to better support dictionary-based tokenization

2019-09-09 Thread Michael Gibney (Jira)

Michael Gibney created LUCENE-8972:
--

 Summary: CharFilter version ICUTransformFilter, to better support 
dictionary-based tokenization
 Key: LUCENE-8972
 URL: https://issues.apache.org/jira/browse/LUCENE-8972
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 8.2, master (9.0)
Reporter: Michael Gibney


The ICU Transliteration API is currently exposed through Lucene only 
post-tokinzer, via ICUTransformFilter. Some tokenizers (particularly 
dictionary-based) may assume pre-normalized input (e.g., for Chinese 
characters, there may be an assumption of traditional-only or simplified-only 
input characters, at the level of either all input, or 
per-dictionary-defined-token).

The potential usefulness of a CharFilter that exposes the ICU Transliteration 
API was suggested in a [thread on the Solr mailing 
list|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201807.mbox/%3C4DAB7BA7-42A8-4009-8B49-60822B00DE7D%40wunderwood.org%3E],
 and my hope is that this issue can facilitate more detailed discussion of the 
proposed addition.

A concrete example of mixed traditional/simplified characters that are 
currently tokenized differently by the ICUTokenizer are:
 * 红楼梦 (SSS)
 * 紅樓夢 (TTT)
 * 紅楼夢 (TST)

The first two tokens (simplified-only and traditional-only, respectively) are 
included in the [CJ dictionary that backs 
ICUTokenizer|https://raw.githubusercontent.com/unicode-org/icu/release-62-1/icu4c/source/data/brkitr/dictionaries/cjdict.txt],
 but the last (a mixture of traditional and simplified characters) is not, and 
is not recognized as a token. Even _if_ we assume this to be an intentional 
omission from the dictionary that results in behavior that could be desirable 
for some use cases, there are surely some use cases that would benefit from a 
more permissive dictionary-based tokenization strategy (such as could be 
supported by pre-tokenizer transliteration).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Request For Addition to Wiki

2019-09-09 Thread Atri Sharma

Please add my login (atris) to Contributor list so that I can update the wiki.

Regards,

Atri

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

2019-09-09 Thread ASF subversion and git services (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Watters updated SOLR-13749:
-
Description: 
This ticket includes 2 query parsers.

The first one is the "Cross collection join filter"  (XCJF) parser. This is the 
"Cross-collection join filter" query parser. It can do a call out to a remote 
collection to get a set of join keys to be used as a filter against the local 
collection.

The second one is the Hash Range query parser that you can specify a field name 
and a hash range, the result is that only the documents that would have hashed 
to that range will be returned.

This query parser will do an intersection based on join keys between 2 
collections.

The local collection is the collection that you are searching against.

The remote collection is the collection that contains the join keys that you 
want to use as a filter.

Each shard participating in the distributed request will execute a query 
against the remote collection.  If the local collection is setup with the 
compositeId router to be routed on the join key field, a hash range query is 
applied to the remote collection query to only match the documents that contain 
a potential match for the documents that are in the local shard/core.  

 

Here's some vocab to help with the descriptions of the various parameters.
||Term||Description||
|Local Collection|This is the main collection that is being queried.|
|Remote Collection|This is the collection that the XCJFQuery will query to 
resolve the join keys.|
|XCJFQuery|The lucene query that executes a search to get back a set of join 
keys from a remote collection|
|HashRangeQuery|The lucene query that matches only the documents whose hash 
code on a field falls within a specified range.|

 

 
||Param ||Required ||Description||
|collection|Required|The name of the external Solr collection to be queried to 
retrieve the set of join key values ( required )|
|zkHost|Optional|The connection string to be used to connect to Zookeeper.  
zkHost and solrUrl are both optional parameters, and at most one of them should 
be specified.  
If neither of zkHost or solrUrl are specified, the local Zookeeper cluster will 
be used. ( optional )|
|solrUrl|Optional|The URL of the external Solr node to be queried ( optional )|
|from|Required|The join key field name in the external collection ( required )|
|to|Required|The join key field name in the local collection|
|v|See Note|The query to be executed against the external Solr collection to 
retrieve the set of join key values.  
Note:  The original query can be passed at the end of the string or as the "v" 
parameter.  
It's recommended to use query parameter substitution with the "v" parameter 
to ensure no issues arise with the default query parsers.|
|routed| |true / false.  If true, the XCJF query will use each shard's hash 
range to determine the set of join keys to retrieve for that shard.
This parameter improves the performance of the cross-collection join, but 
it depends on the local collection being routed by the toField.  If this 
parameter is not specified, 
the XCJF query will try to determine the correct value automatically.|
|ttl| |The length of time that an XCJF query in the cache will be considered 
valid, in seconds.  Defaults to 3600 (one hour).  
The XCJF query will not be aware of changes to the remote collection, so 
if the remote collection is updated, cached XCJF queries may give inaccurate 
results.  
After the ttl period has expired, the XCJF query will re-execute the join 
against the remote collection.|
|_All others_| |Any normal Solr parameter can also be specified as a local 
param.|

 

Example Solr Config.xml changes:

 
 {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
 {{   }}{{class}}{{=}}{{"solr.LRUCache"}}
 {{   }}{{size}}{{=}}{{"128"}}
 {{   }}{{initialSize}}{{=}}{{"0"}}
 {{   }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
  
 {{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} 
{{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}}
 {{  }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin}}
 {{}}
  
 {{<}}{{queryParser}} {{name}}{{=}}{{"hash_range"}} 
{{class}}{{=}}{{"org.apache.solr.search.join.HashRangeQueryParserPlugin"}} 
{{/>}}
  

Example Usage:

{{{!xcjf collection=}}{{"otherCollection"}} {{from=}}{{"fromField"}} 
{{to=}}{{"toField"}} {{v=}}{{"**:**"}}{{}}}
  
  

 

 

 

  was:
This ticket includes 2 query parsers.

The first one is the "Cross collection join filter"  (XCJF) parser. This is the 
"Cross-collection join filter" query parser. It can do a call out to a remote 
collection to get a set of join keys to be used as a filter against the local 
collection.

The second one is the Hash Range query parser that you can specify a field name 
and a hash range, the result is that only the documents that would have hashed 
to that range will be returned.

This query

[jira] [Commented] (SOLR-13677) All Metrics Gauges should be unregistered by the objects that registered them



[ 
https://issues.apache.org/jira/browse/SOLR-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925756#comment-16925756
 ] 

ASF subversion and git services commented on SOLR-13677:


Commit b1bccf7cace424cb895ca6d05b30926697bfe86b in lucene-solr's branch 
refs/heads/branch_8x from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b1bccf7 ]

SOLR-13677: reverting the last commit


> All Metrics Gauges should be unregistered by the objects that registered them
> -
>
> Key: SOLR-13677
> URL: https://issues.apache.org/jira/browse/SOLR-13677
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Noble Paul
>Priority: Blocker
> Fix For: 8.3
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The life cycle of Metrics producers are managed by the core (mostly). So, if 
> the lifecycle of the object is different from that of the core itself, these 
> objects will never be unregistered from the metrics registry. This will lead 
> to memory leaks



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )



 [ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Watters updated SOLR-13749:
-
Description: 
This ticket includes 2 query parsers.

The first one is the "Cross collection join filter"  (XCJF) parser. This is the 
"Cross-collection join filter" query parser. It can do a call out to a remote 
collection to get a set of join keys to be used as a filter against the local 
collection.

The second one is the Hash Range query parser that you can specify a field name 
and a hash range, the result is that only the documents that would have hashed 
to that range will be returned.

This query parser will do an intersection based on join keys between 2 
collections.

The local collection is the collection that you are searching against.

The remote collection is the collection that contains the join keys that you 
want to use as a filter.

Each shard participating in the distributed request will execute a query 
against the remote collection.  If the local collection is setup with the 
compositeId router to be routed on the join key field, a hash range query is 
applied to the remote collection query to only match the documents that contain 
a potential match for the documents that are in the local shard/core.  

 

Here's some vocab to help with the descriptions of the various parameters.
||Term||Description||
|Local Collection|This is the main collection that is being queried.|
|Remote Collection|This is the collection that the XCJFQuery will query to 
resolve the join keys.|
|XCJFQuery|The lucene query that executes a search to get back a set of join 
keys from a remote collection|
|HashRangeQuery|The lucene query that matches only the documents whose hash 
code on a field falls within a specified range.|

 

 
||Param ||Default ||Required ||Description ||
|collection| |Required|The name of the external Solr collection to be queried 
to retrieve the set of join key values ( required )|
|zkHost| |Optional|The connection string to be used to connect to Zookeeper.  
zkHost and solrUrl are both optional parameters, and at most one of them should 
be specified.  If neither of zkHost or solrUrl are specified, the local 
Zookeeper cluster will be used. ( optional )|
|solrUrl| |Optional|The URL of the external Solr node to be queried ( optional 
)|
|from| |Required|The join key field name in the external collection ( required 
)|
|to| |Required|The join key field name in the local collection|
|v| |See Note|The query to be executed against the external Solr collection to 
retrieve the set of join key values.  Note:  The original query can be passed 
at the end of the string or as the "v" parameter.  It's recommended to use 
query parameter substitution with the "v" parameter to ensure no issues arise 
with the default query parsers.|
|routed|See Notes| |true / false.  If true, the XCJF query will use each 
shard's hash range to determine the set of join keys to retrieve for that 
shard.  This parameter improves the performance of the cross-collection join, 
but it depends on the local collection being routed by the toField.  If this 
parameter is not specified, the XCJF query will try to determine the correct 
value automatically.|
|ttl|3600| |The length of time that an XCJF query in the cache will be 
considered valid, in seconds.  Defaults to 3600 (one hour).  The XCJF query 
will not be aware of changes to the remote collection, so if the remote 
collection is updated, cached XCJF queries may give inaccurate results.  After 
the ttl period has expired, the XCJF query will re-execute the join against the 
remote collection.|
|_All others_| | |Any normal Solr parameter can also be specified as a local 
param.|

 

Example Solr Config.xml changes:

 
 {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
 {{   }}{{class}}{{=}}{{"solr.LRUCache"}}
 {{   }}{{size}}{{=}}{{"128"}}
 {{   }}{{initialSize}}{{=}}{{"0"}}
 {{   }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
  
 {{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} 
{{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}}
 {{  }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin}}
 {{}}
  
 {{<}}{{queryParser}} {{name}}{{=}}{{"hash_range"}} 
{{class}}{{=}}{{"org.apache.solr.search.join.HashRangeQueryParserPlugin"}} 
{{/>}}
  

Example Usage:

{{{!xcjf collection=}}{{"otherCollection"}} {{from=}}{{"fromField"}} 
{{to=}}{{"toField"}} {{v=}}{{"\**:\**"}}{{}}}
  
  

 

 

 

  was:
This ticket includes 2 query parsers.


 The first one is the "Cross collection join filter"  (XCJF) parser. This is 
the "Cross-collection join filter" query parser. It can do a call out to a 
remote collection to get a set of join keys to be used as a filter against the 
local collection.

The second one is the Hash Range query parser that you can specify a field name 
and a hash range, the result is that only the documents that would have hashed 
to

[jira] [Updated] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )



 [ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Watters updated SOLR-13749:
-
Description: 
This ticket includes 2 query parsers.

The first one is the "Cross collection join filter"  (XCJF) parser. This is the 
"Cross-collection join filter" query parser. It can do a call out to a remote 
collection to get a set of join keys to be used as a filter against the local 
collection.

The second one is the Hash Range query parser that you can specify a field name 
and a hash range, the result is that only the documents that would have hashed 
to that range will be returned.

This query parser will do an intersection based on join keys between 2 
collections.

The local collection is the collection that you are searching against.

The remote collection is the collection that contains the join keys that you 
want to use as a filter.

Each shard participating in the distributed request will execute a query 
against the remote collection.  If the local collection is setup with the 
compositeId router to be routed on the join key field, a hash range query is 
applied to the remote collection query to only match the documents that contain 
a potential match for the documents that are in the local shard/core.  

 

Here's some vocab to help with the descriptions of the various parameters.
||Term||Description||
|Local Collection|This is the main collection that is being queried.|
|Remote Collection|This is the collection that the XCJFQuery will query to 
resolve the join keys.|
|XCJFQuery|The lucene query that executes a search to get back a set of join 
keys from a remote collection|
|HashRangeQuery|The lucene query that matches only the documents whose hash 
code on a field falls within a specified range.|

 

 
||Param ||Required ||Description ||
|collection|Required|The name of the external Solr collection to be queried to 
retrieve the set of join key values ( required )|
|zkHost|Optional|The connection string to be used to connect to Zookeeper.  
zkHost and solrUrl are both optional parameters, and at most one of them should 
be specified.  If neither of zkHost or solrUrl are specified, the local 
Zookeeper cluster will be used. ( optional )|
|solrUrl|Optional|The URL of the external Solr node to be queried ( optional )|
|from|Required|The join key field name in the external collection ( required )|
|to|Required|The join key field name in the local collection|
|v|See Note|The query to be executed against the external Solr collection to 
retrieve the set of join key values.  Note:  The original query can be passed 
at the end of the string or as the "v" parameter.  It's recommended to use 
query parameter substitution with the "v" parameter to ensure no issues arise 
with the default query parsers.|
|routed| |true / false.  If true, the XCJF query will use each shard's hash 
range to determine the set of join keys to retrieve for that shard.  This 
parameter improves the performance of the cross-collection join, but it depends 
on the local collection being routed by the toField.  If this parameter is not 
specified, the XCJF query will try to determine the correct value 
automatically.|
|ttl| |The length of time that an XCJF query in the cache will be considered 
valid, in seconds.  Defaults to 3600 (one hour).  The XCJF query will not be 
aware of changes to the remote collection, so if the remote collection is 
updated, cached XCJF queries may give inaccurate results.  After the ttl period 
has expired, the XCJF query will re-execute the join against the remote 
collection.|
|_All others_| |Any normal Solr parameter can also be specified as a local 
param.|

 

Example Solr Config.xml changes:

 
 {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
 {{   }}{{class}}{{=}}{{"solr.LRUCache"}}
 {{   }}{{size}}{{=}}{{"128"}}
 {{   }}{{initialSize}}{{=}}{{"0"}}
 {{   }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
  
 {{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} 
{{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}}
 {{  }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin}}
 {{}}
  
 {{<}}{{queryParser}} {{name}}{{=}}{{"hash_range"}} 
{{class}}{{=}}{{"org.apache.solr.search.join.HashRangeQueryParserPlugin"}} 
{{/>}}
  

Example Usage:

{{{!xcjf collection=}}{{"otherCollection"}} {{from=}}{{"fromField"}} 
{{to=}}{{"toField"}} {{v=}}{{"**:**"}}{{}}}
  
  

 

 

 

  was:
This ticket includes 2 query parsers.

The first one is the "Cross collection join filter"  (XCJF) parser. This is the 
"Cross-collection join filter" query parser. It can do a call out to a remote 
collection to get a set of join keys to be used as a filter against the local 
collection.

The second one is the Hash Range query parser that you can specify a field name 
and a hash range, the result is that only the documents that would have hashed 
to that range will be returned.

This query

[jira] [Updated] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )



 [ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Watters updated SOLR-13749:
-
Description: 
This ticket includes 2 query parsers.


 The first one is the "Cross collection join filter"  (XCJF) parser. This is 
the "Cross-collection join filter" query parser. It can do a call out to a 
remote collection to get a set of join keys to be used as a filter against the 
local collection.

The second one is the Hash Range query parser that you can specify a field name 
and a hash range, the result is that only the documents that would have hashed 
to that range will be returned.

This query parser will do an intersection based on join keys between 2 
collections.

The local collection is the collection that you are searching against.

The remote collection is the collection that contains the join keys that you 
want to use as a filter.

Each shard participating in the distributed request will execute a query 
against the remote collection.  If the local collection is setup with the 
compositeId router to be routed on the join key field, a hash range query is 
applied to the remote collection query to only match the documents that contain 
a potential match for the documents that are in the local shard/core.  

 

Here's some vocab to help with the descriptions of the various parameters.
||Term||Description||
|Local Collection|This is the main collection that is being queried.|
|Remote Collection|This is the collection that the XCJFQuery will query to 
resolve the join keys.|
|XCJFQuery|The lucene query that executes a search to get back a set of join 
keys from a remote collection|
|HashRangeQuery|The lucene query that matches only the documents whose hash 
code on a field falls within a specified range.|

 

 
||Param||Default||Required||Description||
|collection| |Required|The name of the external Solr collection to be queried 
to retrieve the set of join key values ( required )|
|zkHost| |Optional|The connection string to be used to connect to Zookeeper.  
zkHost and solrUrl are both optional parameters, and at most one of them should 
be specified.  If neither of zkHost or solrUrl are specified, the local 
Zookeeper cluster will be used. ( optional )|
|solrUrl| |Optional|The URL of the external Solr node to be queried ( optional 
)|
|from| |Required|The join key field name in the external collection ( required 
)|
|to| |Required|The join key field name in the local collection|
|v| |See Note|The query to be executed against the external Solr collection to 
retrieve the set of join key values.  Note:  The original query can be passed 
at the end of the string or as the "v" parameter.  It's recommended to use 
query parameter substitution with the "v" parameter to ensure no issues arise 
with the default query parsers.|
|routed|See Notes| |true / false.  If true, the XCJF query will use each 
shard's hash range to determine the set of join keys to retrieve for that 
shard.  This parameter improves the performance of the cross-collection join, 
but it depends on the local collection being routed by the toField.  If this 
parameter is not specified, the XCJF query will try to determine the correct 
value automatically.|
|ttl|3600| |The length of time that an XCJF query in the cache will be 
considered valid, in seconds.  Defaults to 3600 (one hour).  The XCJF query 
will not be aware of changes to the remote collection, so if the remote 
collection is updated, cached XCJF queries may give inaccurate results.  After 
the ttl period has expired, the XCJF query will re-execute the join against the 
remote collection.|
|_All others_| | |Any normal Solr parameter can also be specified as a local 
param.|

 

Example Solr Config.xml changes:

 
{{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
{{   }}{{class}}{{=}}{{"solr.LRUCache"}}
{{   }}{{size}}{{=}}{{"128"}}
{{   }}{{initialSize}}{{=}}{{"0"}}
{{   }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
 
{{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} 
{{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}}
{{  }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin}}
{{}}
 
{{<}}{{queryParser}} {{name}}{{=}}{{"hash_range"}} 
{{class}}{{=}}{{"org.apache.solr.search.join.HashRangeQueryParserPlugin"}} 
{{/>}}
 

Example Usage:

{!xcjf collection=}}{{"otherCollection"}} {{from=}}{{"fromField"}} 
{{to=}}{{"toField"}} {{v=}}{{"*:*"}}{{}
 
 

 

 

 

  was:
This ticket includes 2 query parsers.


 The first one is the "Cross collection join filter"  (XCJF) parser. This is 
the "Cross-collection join filter" query parser. It can do a call out to a 
remote collection to get a set of join keys to be used as a filter against the 
local collection.

The second one is the Hash Range query parser that you can specify a field name 
and a hash range, the result is that only the documents that would have hashed 
to that range will

[jira] [Created] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

2019-09-09 Thread ASF subversion and git services (Jira)

Kevin Watters created SOLR-13749:


 Summary: Implement support for joining across collections with 
multiple shards ( XCJF )
 Key: SOLR-13749
 URL: https://issues.apache.org/jira/browse/SOLR-13749
 Project: Solr
  Issue Type: New Feature
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Kevin Watters


This ticket includes 2 query parsers.


 The first one is the "Cross collection join filter"  (XCJF) parser. This is 
the "Cross-collection join filter" query parser. It can do a call out to a 
remote collection to get a set of join keys to be used as a filter against the 
local collection.

The second one is the Hash Range query parser that you can specify a field name 
and a hash range, the result is that only the documents that would have hashed 
to that range will be returned.

This query parser will do an intersection based on join keys between 2 
collections.

The local collection is the collection that you are searching against.

The remote collection is the collection that contains the join keys that you 
want to use as a filter.

Each shard participating in the distributed request will execute a query 
against the remote collection.  If the local collection is setup with the 
compositeId router to be routed on the join key field, a hash range query is 
applied to the remote collection query to only match the documents that contain 
a potential match for the documents that are in the local shard/core.  

 

Here's some vocab to help with the descriptions of the various parameters.
||Term||Description||
|Local Collection|This is the main collection that is being queried.|
|Remote Collection|This is the collection that the XCJFQuery will query to 
resolve the join keys.|
|XCJFQuery|The lucene query that executes a search to get back a set of join 
keys from a remote collection|
|HashRangeQuery|The lucene query that matches only the documents whose hash 
code on a field falls within a specified range.|

 

 
||Param||Default||Required||Description||
|collection| |Required|The name of the external Solr collection to be queried 
to retrieve the set of join key values ( required )|
|zkHost| |Optional|The connection string to be used to connect to Zookeeper.  
zkHost and solrUrl are both optional parameters, and at most one of them should 
be specified.  If neither of zkHost or solrUrl are specified, the local 
Zookeeper cluster will be used. ( optional )|
|solrUrl| |Optional|The URL of the external Solr node to be queried ( optional 
)|
|from| |Required|The join key field name in the external collection ( required 
)|
|to| |Required|The join key field name in the local collection|
|v| |See Note|The query to be executed against the external Solr collection to 
retrieve the set of join key values.  Note:  The original query can be passed 
at the end of the string or as the "v" parameter.  It's recommended to use 
query parameter substitution with the "v" parameter to ensure no issues arise 
with the default query parsers.|
|routed|See Notes| |true / false.  If true, the XCJF query will use each 
shard's hash range to determine the set of join keys to retrieve for that 
shard.  This parameter improves the performance of the cross-collection join, 
but it depends on the local collection being routed by the toField.  If this 
parameter is not specified, the XCJF query will try to determine the correct 
value automatically.|
|ttl|3600| |The length of time that an XCJF query in the cache will be 
considered valid, in seconds.  Defaults to 3600 (one hour).  The XCJF query 
will not be aware of changes to the remote collection, so if the remote 
collection is updated, cached XCJF queries may give inaccurate results.  After 
the ttl period has expired, the XCJF query will re-execute the join against the 
remote collection.|
|_All others_| | |Any normal Solr parameter can also be specified as a local 
param.|

 

Example Solr Config.xml changes:

 
{{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
{{   }}{{class}}{{=}}{{"solr.LRUCache"}}
{{   }}{{size}}{{=}}{{"128"}}
{{   }}{{initialSize}}{{=}}{{"0"}}
{{   }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
 
{{<}}{{queryParser}} {{name}}{{=}}{{"xcjf"}} 
{{class}}{{=}}{{"org.apache.solr.search.join.XCJFQueryParserPlugin"}}{{>}}
{{  }}{{<}}{{str}} {{name}}{{=}}{{"routerField"}}{{>vin}}
{{}}
 
{{<}}{{queryParser}} {{name}}{{=}}{{"hash_range"}} 
{{class}}{{=}}{{"org.apache.solr.search.join.HashRangeQueryParserPlugin"}} 
{{/>}}
 

Example Usage:

{!xcjf collection=}}{{"otherCollection"}} {{from=}}{{"fromField"}} 
{{to=}}{{"toField"}} {{v=}}{{"*:*"}}{{}
 
 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail:

Re: Alias Id condundrum

2019-09-09 Thread Gus Heck

That's certainly an option, but I was leaning the other way (making it
work). I know of a user that is dividing up their data into frequently and
less frequently (re)indexed stuff which is normally accessed by an alias
and they presently have to query for the list of collections in the alias
and then /get on each collection independently because of the current
behavior. This works, and if we start producing an error, they can of
course continue to do that, but it feels clumsy and inelegant for them to
have to do that to me at least). Also, it might not be all bad if it worked
with routed aliases.

On Fri, Sep 6, 2019 at 5:06 PM David Smiley 
wrote:

> On Wed, Sep 4, 2019 at 11:26 PM Gus Heck  wrote:
>
>> It seems that the real time get handler doesn't play nice with aliases.
>> The current (and past) behavior seems to be that it only works for the
>> first collection listed in the alias. This seems to be pretty clearly a
>> bug, as one certainly would expect the /get executed against an alias to
>> either refuse to work with aliases or work across all collections in the
>> alias rather than silently working only on the first collection.
>>
>
> I think it should just refuse to work (throw an exception) if there are
> multiple collections in the alias -- simple.  It's okay for components to
> have a limitation.
>
> Solr's internal use of RTG isn't affected by this scenario.  I believe few
> users even use RTG but yes of course some do and I know of at least one.
> In the one case I saw RTG used, it was an nice optimization that replaced
> its former mode of operation that worked fine.
>
> ~ David
>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

[jira] [Commented] (SOLR-13138) Remove deprecated code in master



[ 
https://issues.apache.org/jira/browse/SOLR-13138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925662#comment-16925662
 ] 

ASF subversion and git services commented on SOLR-13138:


Commit 46825ba94d5805b96c376c00d05d16c921cde4ad in lucene-solr's branch 
refs/heads/master-deprecations from David Smiley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=46825ba ]

SOLR-13138: Spatial removals and deprecations:
* Removed GeoHashField. (was deprecated)
* Removed LatLonType (was deprecated)
* Removed SpatialPointVectorFieldType (was deprecated)
* Removed SpatialTermQueryPrefixTreeFieldType (was deprecated)
* Deprecated legacy/BBoxStrategy as we will switch to Lucene's.
  Related to Trie/Points conversion.
* Removed spatial fields from some of our examples that don't exercise
  spatial.


> Remove deprecated code in master
> 
>
> Key: SOLR-13138
> URL: https://issues.apache.org/jira/browse/SOLR-13138
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Alan Woodward
>Priority: Major
>
> There are a number of deprecations in master that should be removed.  This 
> issue is to keep track of deprecations as a whole, some individual 
> deprecations may require their own issues.
>  
> Work on this issue should be pushed to the `master-deprecations` branch on 
> gitbox.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13677) All Metrics Gauges should be unregistered by the objects that registered them

2019-09-09 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925658#comment-16925658
 ] 

ASF subversion and git services commented on SOLR-13677:


Commit a288710a64acdde6abc8ce96a0d3b3e18739ac32 in lucene-solr's branch 
refs/heads/master from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a288710 ]

SOLR-13677: reverting the last commit (#863)



> All Metrics Gauges should be unregistered by the objects that registered them
> -
>
> Key: SOLR-13677
> URL: https://issues.apache.org/jira/browse/SOLR-13677
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Noble Paul
>Priority: Blocker
> Fix For: 8.3
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The life cycle of Metrics producers are managed by the core (mostly). So, if 
> the lifecycle of the object is different from that of the core itself, these 
> objects will never be unregistered from the metrics registry. This will lead 
> to memory leaks



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13677) All Metrics Gauges should be unregistered by the objects that registered them

2019-09-09 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925657#comment-16925657
 ] 

ASF subversion and git services commented on SOLR-13677:


Commit 042478cfa795dd537dcd4863a0524a73bad9a740 in lucene-solr's branch 
refs/heads/master from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=042478c ]

SOLR-13677: reverting the last commit


> All Metrics Gauges should be unregistered by the objects that registered them
> -
>
> Key: SOLR-13677
> URL: https://issues.apache.org/jira/browse/SOLR-13677
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Reporter: Noble Paul
>Priority: Blocker
> Fix For: 8.3
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The life cycle of Metrics producers are managed by the core (mostly). So, if 
> the lifecycle of the object is different from that of the core itself, these 
> objects will never be unregistered from the metrics registry. This will lead 
> to memory leaks



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] [lucene-solr] noblepaul merged pull request #863: SOLR-13677: reverting the last commit

noblepaul merged pull request #863: SOLR-13677: reverting the last commit
URL: https://github.com/apache/lucene-solr/pull/863
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8638) Remove deprecated code in master

2019-09-09 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925655#comment-16925655
 ] 

David Smiley commented on LUCENE-8638:
--

The branch doesn't pass precommit now because javadocs in LuceneTestCase refer 
to the getBaseTempDirForTestClass you removed.

> Remove deprecated code in master
> 
>
> Key: LUCENE-8638
> URL: https://issues.apache.org/jira/browse/LUCENE-8638
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: master (9.0)
>
>
> There are a number of deprecations in master that should be removed. This 
> issue is to keep track of deprecations as a whole, some individual 
> deprecations may require their own issues.
>  
> Work on this issue should be pushed to the `master-deprecations` branch on 
> gitbox



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13138) Remove deprecated code in master

2019-09-09 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925654#comment-16925654
 ] 

David Smiley commented on SOLR-13138:
-

Two Jira issues but one branch, and thus presumably one commit that spans 
projects?  Ehh; this wouldn't be my preference but I guess it's okay.

> Remove deprecated code in master
> 
>
> Key: SOLR-13138
> URL: https://issues.apache.org/jira/browse/SOLR-13138
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Alan Woodward
>Priority: Major
>
> There are a number of deprecations in master that should be removed.  This 
> issue is to keep track of deprecations as a whole, some individual 
> deprecations may require their own issues.
>  
> Work on this issue should be pushed to the `master-deprecations` branch on 
> gitbox.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] [lucene-solr] thomaswoeckinger edited a comment on issue #855: SOLR-13739: Improve performance on huge schema updates

thomaswoeckinger edited a comment on issue #855: SOLR-13739: Improve 
performance on huge schema updates
URL: https://github.com/apache/lucene-solr/pull/855#issuecomment-529448374
 
 
   > Overall looks good. Do "ant precommit" and "ant test" pass?
   
   Yes they pass.
   
   > I presume you did manual inspection to see this has the intended effect. 
It seems it'll only work for instance-equality of ManagedREsourceObservers.
   
   Instance for sure, and all which are using equals() and hashCode(), but i 
did not found one


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] [lucene-solr] thomaswoeckinger commented on a change in pull request #855: SOLR-13739: Improve performance on huge schema updates

thomaswoeckinger commented on a change in pull request #855: SOLR-13739: 
Improve performance on huge schema updates
URL: https://github.com/apache/lucene-solr/pull/855#discussion_r322213237
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/rest/RestManager.java
 ##
 @@ -76,7 +76,7 @@
   private static class ManagedResourceRegistration {
 String resourceId;
 Class implClass;
-List observers = new ArrayList<>();
+Set observers = new HashSet<>();
 
 Review comment:
   I can change this, i found no test case which will test ordering explicitly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] [lucene-solr] thomaswoeckinger commented on issue #855: SOLR-13739: Improve performance on huge schema updates

thomaswoeckinger commented on issue #855: SOLR-13739: Improve performance on 
huge schema updates
URL: https://github.com/apache/lucene-solr/pull/855#issuecomment-529448374
 
 
   > Overall looks good. Do "ant precommit" and "ant test" pass?
   > I presume you did manual inspection to see this has the intended effect. 
It seems it'll only work for instance-equality of ManagedREsourceObservers.
   
   Instance for sure, and all which are using equals() and hashCode(), but i 
did not found one


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #855: SOLR-13739: Improve performance on huge schema updates

2019-09-09 Thread ASF subversion and git services (Jira)

dsmiley commented on a change in pull request #855: SOLR-13739: Improve 
performance on huge schema updates
URL: https://github.com/apache/lucene-solr/pull/855#discussion_r322209342
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/rest/RestManager.java
 ##
 @@ -76,7 +76,7 @@
   private static class ManagedResourceRegistration {
 String resourceId;
 Class implClass;
-List observers = new ArrayList<>();
+Set observers = new HashSet<>();
 
 Review comment:
   I propose using LinkedHashSet so that these get initialized in a consistent 
predictable order.  Ideally this doesn't matter but it can help in debugging 
and in case some custom components are ordering dependent.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13360) StringIndexOutOfBoundsException: String index out of range: -3

2019-09-09 Thread Chongchen Chen (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925632#comment-16925632
 ] 

Chongchen Chen commented on SOLR-13360:
---

[~medanisdk] I tried to reproduce the problem, but no luck. could you please 
attach your solr home.

> StringIndexOutOfBoundsException: String index out of range: -3
> --
>
> Key: SOLR-13360
> URL: https://issues.apache.org/jira/browse/SOLR-13360
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.2.1
> Environment: Solr 7.2.1 - SAP Hybris 6.7.0.8
>Reporter: Ahmed Ghoneim
>Priority: Critical
>
> *{color:#ff}I cannot execute the following query:{color}*
> {noformat}
> http://localhost:8983/solr/master_Project_Product_flip/suggest?q=duotop=duotop=/suggest=de=true{noformat}
> 4/1/2019, 1:16:07 PM ERROR true RequestHandlerBase 
> java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> {code:java}
> java.lang.StringIndexOutOfBoundsException: String index out of range: -3
>   at 
> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:851)
>   at java.lang.StringBuilder.replace(StringBuilder.java:262)
>   at 
> org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:252)
>   at 
> org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:94)
>   at 
> org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:297)
>   at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:209)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
>

[jira] [Commented] (LUCENE-8966) KoreanTokenizer should split unknown words on digits

2019-09-09 Thread Jim Ferenczi (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925588#comment-16925588
 ] 

Jim Ferenczi commented on LUCENE-8966:
--

I don't think it's a bug [~danmuzi] or at least that it's related to this 
issue. In your example the first dot ('.' is a word dictionary) is considered a 
better path than grouping all dots eagerly. We process the unknown words 
greedily so we compare the path "[4], [.], [.]" with  "[4], [.], [.], 
[]", "[4], [.], [.], [.], [...]", ... "[4], [..]". Keeping the first 
dot separated from the rest indicates that a number followed by a dot is a 
better splitting path than multiple dots in our model. We can discuss this 
behavior in a new issue if you think this should be configurable (for instance 
the JapaneseTokenizer process unknown words greedily only in search mode) ?

> KoreanTokenizer should split unknown words on digits
> 
>
> Key: LUCENE-8966
> URL: https://issues.apache.org/jira/browse/LUCENE-8966
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-8966.patch, LUCENE-8966.patch
>
>
> Since https://issues.apache.org/jira/browse/LUCENE-8548 the Korean tokenizer 
> groups characters of unknown words if they belong to the same script or an 
> inherited one. This is ok for inputs like Мoscow (with a Cyrillic М and the 
> rest in Latin) but this rule doesn't work well on digits since they are 
> considered common with other scripts. For instance the input "44사이즈" is kept 
> as is even though "사이즈" is part of the dictionary. We should restore the 
> original behavior and splits any unknown words if a digit is followed by 
> another type.
> This issue was first discovered in 
> [https://github.com/elastic/elasticsearch/issues/46365]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8620) Add CONTAINS support for LatLonShape



[ 
https://issues.apache.org/jira/browse/LUCENE-8620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925511#comment-16925511
 ] 

ASF subversion and git services commented on LUCENE-8620:
-

Commit 252421bb77c06bc074f416313ca794225de68a29 in lucene-solr's branch 
refs/heads/branch_8x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=252421b ]

LUCENE-8620: Update Tessellator logic to label if triangle edges belongs to the 
original polygon (#771)




> Add CONTAINS support for LatLonShape
> 
>
> Key: LUCENE-8620
> URL: https://issues.apache.org/jira/browse/LUCENE-8620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/sandbox
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: LUCENE-8620.patch, LUCENE-8620.patch
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Currently the only spatial operation that cannot be performed using 
> {{LatLonShape}} is CONTAINS. This issue will add such capability by tracking 
> if an edge of a generated triangle from the {{Tessellator}} is an edge of the 
> polygon.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8620) Add CONTAINS support for LatLonShape

2019-09-09 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925508#comment-16925508
 ] 

ASF subversion and git services commented on LUCENE-8620:
-

Commit 62001b9b9651e54b54f73352801061d40da75168 in lucene-solr's branch 
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=62001b9 ]

LUCENE-8620: Update Tessellator logic to label if triangle edges belongs to the 
original polygon (#771)




> Add CONTAINS support for LatLonShape
> 
>
> Key: LUCENE-8620
> URL: https://issues.apache.org/jira/browse/LUCENE-8620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/sandbox
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: LUCENE-8620.patch, LUCENE-8620.patch
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Currently the only spatial operation that cannot be performed using 
> {{LatLonShape}} is CONTAINS. This issue will add such capability by tracking 
> if an edge of a generated triangle from the {{Tessellator}} is an edge of the 
> polygon.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase merged pull request #771: LUCENE-8620: Update Tessellator logic to label if triangle edges belongs to the original polygon

iverase merged pull request #771: LUCENE-8620: Update Tessellator logic to 
label if triangle edges belongs to the original polygon
URL: https://github.com/apache/lucene-solr/pull/771
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] [lucene-solr] thomaswoeckinger commented on issue #855: SOLR-13739: Improve performance on huge schema updates

thomaswoeckinger commented on issue #855: SOLR-13739: Improve performance on 
huge schema updates
URL: https://github.com/apache/lucene-solr/pull/855#issuecomment-529367883
 
 
   @gerlowskija or @dsmiley: if anyone of you has time to review this very easy 
fix, that would be great


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8951) Create issues@ and builds@ lists and update notifications

2019-09-09 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925476#comment-16925476
 ] 

Uwe Schindler commented on LUCENE-8951:
---

The mails by Jenkins are still getting lost. I had no time to keep track. It 
looks like we need to open an issue and tell them to reconfigure the lists to 
disable any content filters. Jenkins sends huge mails with various MIME types 
in multipart/mixed.

> Create issues@ and builds@ lists and update notifications
> -
>
> Key: LUCENE-8951
> URL: https://issues.apache.org/jira/browse/LUCENE-8951
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>
> Issue to plan and execute decision from dev mailing list 
> [https://lists.apache.org/thread.html/762d72a9045642dc488dc7a2fd0a525707e5fa5671ac0648a3604c9b@%3Cdev.lucene.apache.org%3E]
>  # Create mailing lists as an announce only list (/)
>  # Subscribe all emails that will be allowed to post (/)
>  # Update websites with info about the new lists (/)
>  # Announce to dev@ list that the change will happen (/)
>  # Modify Jira and Github bots to post to issues@ list instead of dev@
>  # Modify Jenkins (including Policeman and other) to post to builds@
>  # Announce to dev@ list that the change is effective



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-13748) mm (min should match) param for {!bool} query parser

2019-09-09 Thread Mikhail Khludnev (Jira)

Mikhail Khludnev created SOLR-13748:
---

 Summary: mm (min should match) param for {!bool} query parser
 Key: SOLR-13748
 URL: https://issues.apache.org/jira/browse/SOLR-13748
 Project: Solr
  Issue Type: Sub-task
  Components: query parsers
Reporter: Mikhail Khludnev






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13240) UTILIZENODE action results in an exception

2019-09-09 Thread Tim Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925451#comment-16925451
 ] 

Tim Owen commented on SOLR-13240:
-

Great! Thanks for all your work on this Christine

> UTILIZENODE action results in an exception
> --
>
> Key: SOLR-13240
> URL: https://issues.apache.org/jira/browse/SOLR-13240
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling
>Affects Versions: 7.6
>Reporter: Hendrik Haddorp
>Assignee: Christine Poerschke
>Priority: Major
> Fix For: master (9.0), 8.3
>
> Attachments: SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, 
> SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, SOLR-13240.patch, 
> SOLR-13240.patch, solr-solrj-7.5.0.jar
>
>
> When I invoke the UTILIZENODE action the REST call fails like this after it 
> moved a few replicas:
> {
>   "responseHeader":{
> "status":500,
> "QTime":40220},
>   "Operation utilizenode caused 
> exception:":"java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
>  Comparison method violates its general contract!",
>   "exception":{
> "msg":"Comparison method violates its general contract!",
> "rspCode":-1},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Comparison method violates its general contract!",
> "trace":"org.apache.solr.common.SolrException: Comparison method violates 
> its general contract!\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:274)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat
>  org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat 
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat 
>

[jira] [Commented] (LUCENE-8951) Create issues@ and builds@ lists and update notifications



[ 
https://issues.apache.org/jira/browse/LUCENE-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925441#comment-16925441
 ] 

Jan Høydahl commented on LUCENE-8951:
-

Sent the email to notify people on what will happen.

Who knows how to reconfigure Jenkins, Jira & GitBox?

> Create issues@ and builds@ lists and update notifications
> -
>
> Key: LUCENE-8951
> URL: https://issues.apache.org/jira/browse/LUCENE-8951
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>
> Issue to plan and execute decision from dev mailing list 
> [https://lists.apache.org/thread.html/762d72a9045642dc488dc7a2fd0a525707e5fa5671ac0648a3604c9b@%3Cdev.lucene.apache.org%3E]
>  # Create mailing lists as an announce only list (/)
>  # Subscribe all emails that will be allowed to post (/)
>  # Update websites with info about the new lists (/)
>  # Announce to dev@ list that the change will happen (/)
>  # Modify Jira and Github bots to post to issues@ list instead of dev@
>  # Modify Jenkins (including Policeman and other) to post to builds@
>  # Announce to dev@ list that the change is effective



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[ANNOUNCE] New builds@ and issues@ mailing lists

2019-09-09 Thread Jan Høydahl

The Lucene project has added two new announce mailing lists,
`iss...@lucene.aparche.org` and `bui...@lucene.apache.org`.  High-volume
automated emails from our bug tracker, JIRA and GitHub will be moved
from the `dev@` list to `issues@` and automated emails from our Jenkins
CI build servers will be moved from the `dev@` list to `builds@`. This
will happen during the next few days.

This is an effort to reduce the sometimes overwhelming email volume on
our main development mailing list and thus make it easier for the
community to follow important discussions by humans on the
`dev@lucene.apache.org` list.

Everyone who wants to continue receiving these automated emails should
sign up for one or both of the two new lists. Sign-up instructions can
be found on the Lucene-java[1] and Solr[2] web sites.

[1] https://lucene.apache.org/core/discussion.html
[2] https://lucene.apache.org/solr/community.html

--
Jan Høydahl, on behalf of the Lucene PMC

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8951) Create issues@ and builds@ lists and update notifications



 [ 
https://issues.apache.org/jira/browse/LUCENE-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated LUCENE-8951:

Description: 
Issue to plan and execute decision from dev mailing list 
[https://lists.apache.org/thread.html/762d72a9045642dc488dc7a2fd0a525707e5fa5671ac0648a3604c9b@%3Cdev.lucene.apache.org%3E]
 # Create mailing lists as an announce only list (/)
 # Subscribe all emails that will be allowed to post (/)
 # Update websites with info about the new lists (/)
 # Announce to dev@ list that the change will happen (/)
 # Modify Jira and Github bots to post to issues@ list instead of dev@
 # Modify Jenkins (including Policeman and other) to post to builds@
 # Announce to dev@ list that the change is effective

  was:
Issue to plan and execute decision from dev mailing list 
[https://lists.apache.org/thread.html/762d72a9045642dc488dc7a2fd0a525707e5fa5671ac0648a3604c9b@%3Cdev.lucene.apache.org%3E]
 # Create mailing lists as an announce only list (/)
 # Subscribe all emails that will be allowed to post (/)
 # Update websites with info about the new lists (/)
 # Announce to dev@ list that the change will happen
 # Modify Jira and Github bots to post to issues@ list instead of dev@
 # Modify Jenkins (including Policeman and other) to post to builds@
 # Announce to dev@ list that the change is effective


> Create issues@ and builds@ lists and update notifications
> -
>
> Key: LUCENE-8951
> URL: https://issues.apache.org/jira/browse/LUCENE-8951
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>
> Issue to plan and execute decision from dev mailing list 
> [https://lists.apache.org/thread.html/762d72a9045642dc488dc7a2fd0a525707e5fa5671ac0648a3604c9b@%3Cdev.lucene.apache.org%3E]
>  # Create mailing lists as an announce only list (/)
>  # Subscribe all emails that will be allowed to post (/)
>  # Update websites with info about the new lists (/)
>  # Announce to dev@ list that the change will happen (/)
>  # Modify Jira and Github bots to post to issues@ list instead of dev@
>  # Modify Jenkins (including Policeman and other) to post to builds@
>  # Announce to dev@ list that the change is effective



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13293) org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient - Error consuming and closing http response stream.