[jira] [Updated] (SOLR-14160) Abstract out the replica identification interface for internode requests

2020-01-02 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-14160:
--
Description: 
The list of replicas to which requests are sent are identified in a static 
fashion. There is no way to cleanly plugin alternate logic of identifying the 
replica list .

two major interfaces abstracted out are
{code:java}
 public interface HeaderInterceptor {
  /**Names of headers to be read from the response
   */
  List responseHeaders();

  /**The callback received after a response. This must be called with  a 'null' 
if there was no such header
   *
   */
  void responseHeaderVal(String key, String val);

  /**The headers that must be added to the request.
   * @return return null or an empty map if nothing needs to added
   */
  Map requestHeaders();

}

{code}
{code:java}
/**This interface gives urls one by one instead of giving a full list.
 * This also lets the implementation provide extra details as HTTP headers
 *
 */
public interface UrlProvider extends HeaderInterceptor {

  /**Give the base url for the next request to be fired. This is invoked the 
first time
   * a request needs to be fired, or subsequent after {@link 
UrlProvider#toRetry()} returns true
   * @return null if no more nodes/cores are available
   */
  String nextBaseurl();


  /**Verify header value. This callback WILL be given if responseHeaderName is 
supplied, even if it is null
   * @return true if the request should be retried
   * Return the error message if there is something wrong with the response
   */
  boolean toRetry() throws SolrException;

}
{code}

  was:
The list of replicas to which requests are sent are identified in a static 
fashion. There is no way to cleanly plugin alternate logic of identifying the 
replica list .

two major interfaces abstracted out are
{code:java}
  /**Names of headers to be read from the response
   */
  List responseHeaders();

  /**The callback received after a response. This must be called with  a 'null' 
if there was no such header
   *
   */
  void responseHeaderVal(String key, String val);

  /**The headers that must be added to the request.
   * @return return null or an empty map if nothing needs to added
   */
  Map requestHeaders();
{code}
{code:java}
/**This interface gives urls one by one instead of giving a full list.
 * This also lets the implementation provide extra details as HTTP headers
 *
 */
public interface UrlProvider extends HeaderInterceptor {

  /**Give the base url for the next request to be fired. This is invoked the 
first time
   * a request needs to be fired, or subsequent after {@link 
UrlProvider#toRetry()} returns true
   * @return null if no more nodes/cores are available
   */
  String nextBaseurl();


  /**Verify header value. This callback WILL be given if responseHeaderName is 
supplied, even if it is null
   * @return true if the request should be retried
   * Return the error message if there is something wrong with the response
   */
  boolean toRetry() throws SolrException;

}
{code}


> Abstract out the replica identification interface for internode requests
> 
>
> Key: SOLR-14160
> URL: https://issues.apache.org/jira/browse/SOLR-14160
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> The list of replicas to which requests are sent are identified in a static 
> fashion. There is no way to cleanly plugin alternate logic of identifying the 
> replica list .
> two major interfaces abstracted out are
> {code:java}
>  public interface HeaderInterceptor {
>   /**Names of headers to be read from the response
>*/
>   List responseHeaders();
>   /**The callback received after a response. This must be called with  a 
> 'null' if there was no such header
>*
>*/
>   void responseHeaderVal(String key, String val);
>   /**The headers that must be added to the request.
>* @return return null or an empty map if nothing needs to added
>*/
>   Map requestHeaders();
> }
> {code}
> {code:java}
> /**This interface gives urls one by one instead of giving a full list.
>  * This also lets the implementation provide extra details as HTTP headers
>  *
>  */
> public interface UrlProvider extends HeaderInterceptor {
>   /**Give the base url for the next request to be fired. This is invoked the 
> first time
>* a request needs to be fired, or subsequent after {@link 
> UrlProvider#toRetry()} returns true
>* @return null if no more nodes/cores are available
>*/
>   String nextBaseurl();
>   /**Verify header value. This callback WILL be given if responseHeaderName 
> is supplied, even if it is null
>* @return true if the request should be 

[jira] [Updated] (SOLR-14160) Abstract out the replica identification interface for internode requests

2020-01-02 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-14160:
--
Description: 
The list of replicas to which requests are sent are identified in a static 
fashion. There is no way to cleanly plugin alternate logic of identifying the 
replica list .

two major interfaces abstracted out are
{code:java}
  /**Names of headers to be read from the response
   */
  List responseHeaders();

  /**The callback received after a response. This must be called with  a 'null' 
if there was no such header
   *
   */
  void responseHeaderVal(String key, String val);

  /**The headers that must be added to the request.
   * @return return null or an empty map if nothing needs to added
   */
  Map requestHeaders();
{code}
{code:java}
/**This interface gives urls one by one instead of giving a full list.
 * This also lets the implementation provide extra details as HTTP headers
 *
 */
public interface UrlProvider extends HeaderInterceptor {

  /**Give the base url for the next request to be fired. This is invoked the 
first time
   * a request needs to be fired, or subsequent after {@link 
UrlProvider#toRetry()} returns true
   * @return null if no more nodes/cores are available
   */
  String nextBaseurl();


  /**Verify header value. This callback WILL be given if responseHeaderName is 
supplied, even if it is null
   * @return true if the request should be retried
   * Return the error message if there is something wrong with the response
   */
  boolean toRetry() throws SolrException;

}
{code}

  was:
The list of replicas to which requests are sent are identified in a static 
fashion. There is no way to cleanly plugin alternate logic of identifying the 
replica list .

three major interfaces abstracted out are

{code:java}

import java.util.List;

public interface ResponseHeaderInterceptor {
 /**Names of headers to be read from the response
 */
 List responseHeaders();

 /**The callback received after a response. This must be called with a 'null' 
if there was no such header
 *
 */
 void responseHeaderVal(String key, String val);

}
{code}

{code}

public interface RequestHeaderValueProvider {
  /**The headers that must be added to the request.
   * @return return null or an empty map if nothing needs to added
   */
  Map requestHeaders();

}
{code}


{code:java}
/**This interface gives urls one by one instead of giving a full list.
 * This also lets the implementation provide extra details as HTTP headers
 *
 */
public interface UrlProvider extends ResponseHeaderInterceptor, 
RequestHeaderValueProvider {

  /**Give the base url for the next request to be fired. This is invoked the 
first time
   * a request needs to be fired, or subsequent after {@link 
UrlProvider#toRetry()} returns true
   * @return null if no more nodes/cores are available
   */
  String nextBaseurl();


  /**Verify header value. This callback WILL be given if responseHeaderName is 
supplied, even if it is null
   * @return true if the request should be retried
   * Return the error message if there is something wrong with the response
   */
  boolean toRetry() throws SolrException;

}
{code}


> Abstract out the replica identification interface for internode requests
> 
>
> Key: SOLR-14160
> URL: https://issues.apache.org/jira/browse/SOLR-14160
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> The list of replicas to which requests are sent are identified in a static 
> fashion. There is no way to cleanly plugin alternate logic of identifying the 
> replica list .
> two major interfaces abstracted out are
> {code:java}
>   /**Names of headers to be read from the response
>*/
>   List responseHeaders();
>   /**The callback received after a response. This must be called with  a 
> 'null' if there was no such header
>*
>*/
>   void responseHeaderVal(String key, String val);
>   /**The headers that must be added to the request.
>* @return return null or an empty map if nothing needs to added
>*/
>   Map requestHeaders();
> {code}
> {code:java}
> /**This interface gives urls one by one instead of giving a full list.
>  * This also lets the implementation provide extra details as HTTP headers
>  *
>  */
> public interface UrlProvider extends HeaderInterceptor {
>   /**Give the base url for the next request to be fired. This is invoked the 
> first time
>* a request needs to be fired, or subsequent after {@link 
> UrlProvider#toRetry()} returns true
>* @return null if no more nodes/cores are available
>*/
>   String nextBaseurl();
>   /**Verify header value. This callback WILL be given if responseHeaderName 
> 

[jira] [Updated] (SOLR-14160) Abstract out the replica identification interface for internode requests

2020-01-02 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-14160:
--
Description: 
The list of replicas to which requests are sent are identified in a static 
fashion. There is no way to cleanly plugin alternate logic of identifying the 
replica list .

three major interfaces abstracted out are

{code:java}

import java.util.List;

public interface ResponseHeaderInterceptor {
 /**Names of headers to be read from the response
 */
 List responseHeaders();

 /**The callback received after a response. This must be called with a 'null' 
if there was no such header
 *
 */
 void responseHeaderVal(String key, String val);

}
{code}

{code}

public interface RequestHeaderValueProvider {
  /**The headers that must be added to the request.
   * @return return null or an empty map if nothing needs to added
   */
  Map requestHeaders();

}
{code}


{code:java}
/**This interface gives urls one by one instead of giving a full list.
 * This also lets the implementation provide extra details as HTTP headers
 *
 */
public interface UrlProvider extends ResponseHeaderInterceptor, 
RequestHeaderValueProvider {

  /**Give the base url for the next request to be fired. This is invoked the 
first time
   * a request needs to be fired, or subsequent after {@link 
UrlProvider#toRetry()} returns true
   * @return null if no more nodes/cores are available
   */
  String nextBaseurl();


  /**Verify header value. This callback WILL be given if responseHeaderName is 
supplied, even if it is null
   * @return true if the request should be retried
   * Return the error message if there is something wrong with the response
   */
  boolean toRetry() throws SolrException;

}
{code}

  was:
The list of replicas to which requests are sent are identified in a static 
fashion. There is no way to cleanly plugin alternate logic of identifying the 
replica list .

two major interfaces abstracted out are

{code:java}

import java.util.List;

public interface HeaderInterceptor {
 /**Names of headers to be read from the resposne
 */
 List responseHeaders();

 /**The callback received after a response. This must be called with a 'null' 
if there was no such header
 *
 */
 void responseHeaderVal(String key, String val);

}
{code}

{code}

public interface RequestHeaderValueProvider {
  /**The headers that must be added to the request.
   * @return return null or an empty map if nothing needs to added
   */
  Map requestHeaders();

}
{code}


{code:java}
/**This interface gives urls one by one instead of giving a full list.
 * This also lets the implementation provide extra details as HTTP headers
 *
 */
public interface UrlProvider extends HeaderInterceptor, 
RequestHeaderValueProvider {

  /**Give the base url for the next request to be fired. This is invoked the 
first time
   * a request needs to be fired, or subsequent after {@link 
UrlProvider#toRetry()} returns true
   * @return null if no more nodes/cores are available
   */
  String nextBaseurl();


  /**Verify header value. This callback WILL be given if responseHeaderName is 
supplied, even if it is null
   * @return true if the request should be retried
   * Return the error message if there is something wrong with the response
   */
  boolean toRetry() throws SolrException;

}
{code}


> Abstract out the replica identification interface for internode requests
> 
>
> Key: SOLR-14160
> URL: https://issues.apache.org/jira/browse/SOLR-14160
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> The list of replicas to which requests are sent are identified in a static 
> fashion. There is no way to cleanly plugin alternate logic of identifying the 
> replica list .
> three major interfaces abstracted out are
> {code:java}
> import java.util.List;
> public interface ResponseHeaderInterceptor {
>  /**Names of headers to be read from the response
>  */
>  List responseHeaders();
>  /**The callback received after a response. This must be called with a 'null' 
> if there was no such header
>  *
>  */
>  void responseHeaderVal(String key, String val);
> }
> {code}
> {code}
> public interface RequestHeaderValueProvider {
>   /**The headers that must be added to the request.
>* @return return null or an empty map if nothing needs to added
>*/
>   Map requestHeaders();
> }
> {code}
> {code:java}
> /**This interface gives urls one by one instead of giving a full list.
>  * This also lets the implementation provide extra details as HTTP headers
>  *
>  */
> public interface UrlProvider extends ResponseHeaderInterceptor, 
> RequestHeaderValueProvider {
>   /**Give the base url for the next 

[jira] [Updated] (SOLR-14160) Abstract out the replica identification interface for internode requests

2020-01-02 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-14160:
--
Description: 
The list of replicas to which requests are sent are identified in a static 
fashion. There is no way to cleanly plugin alternate logic of identifying the 
replica list .

two major interfaces abstracted out are

{code:java}

import java.util.List;

public interface HeaderInterceptor {
 /**Names of headers to be read from the resposne
 */
 List responseHeaders();

 /**The callback received after a response. This must be called with a 'null' 
if there was no such header
 *
 */
 void responseHeaderVal(String key, String val);

}
{code}

{code}

public interface RequestHeaderValueProvider {
  /**The headers that must be added to the request.
   * @return return null or an empty map if nothing needs to added
   */
  Map requestHeaders();

}
{code}


{code:java}
/**This interface gives urls one by one instead of giving a full list.
 * This also lets the implementation provide extra details as HTTP headers
 *
 */
public interface UrlProvider extends HeaderInterceptor, 
RequestHeaderValueProvider {

  /**Give the base url for the next request to be fired. This is invoked the 
first time
   * a request needs to be fired, or subsequent after {@link 
UrlProvider#toRetry()} returns true
   * @return null if no more nodes/cores are available
   */
  String nextBaseurl();


  /**Verify header value. This callback WILL be given if responseHeaderName is 
supplied, even if it is null
   * @return true if the request should be retried
   * Return the error message if there is something wrong with the response
   */
  boolean toRetry() throws SolrException;

}
{code}

  was:
The list of replicas to which requests are sent are identified in a static 
fashion. There is no way to cleanly plugin alternate logic of identifying the 
replica list .

two major interfaces abstracted out are

{code:java}

import java.util.List;

public interface HeaderInterceptor {
 /**Names of headers to be read from the resposne
 */
 List responseHeaders();

 /**The callback received after a response. This must be called with a 'null' 
if there was no such header
 *
 */
 void responseHeaderVal(String key, String val);

}
{code}

{code:java}

/**This interface gives urls one by one instead of giving a full list.
 * This also lets the implementation provide extra details as HTTP headers
 *
 */
public interface UrlProvider extends HeaderInterceptor {

 /**Give the base url for the next request to be fired. This is invoked the 
first time
 * a request needs to be fired, or subsequent after \{@link 
UrlProvider#toRetry()} returns true
 * @return null if no more nodes/cores are available
 */
 String nextBaseurl();

 /**The headers that must be added to the request.
 * @return return null or an empty map if nothing needs to added
 */
 Map requestHeaders();

 /**Verify header value. This callback WILL be given if responseHeaderName is 
supplied, even if it is null
 * @return true if the request should be retried
 * Return the error message if there is something wrong with the response
 */
 boolean toRetry() throws SolrException;

}
{code}


> Abstract out the replica identification interface for internode requests
> 
>
> Key: SOLR-14160
> URL: https://issues.apache.org/jira/browse/SOLR-14160
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> The list of replicas to which requests are sent are identified in a static 
> fashion. There is no way to cleanly plugin alternate logic of identifying the 
> replica list .
> two major interfaces abstracted out are
> {code:java}
> import java.util.List;
> public interface HeaderInterceptor {
>  /**Names of headers to be read from the resposne
>  */
>  List responseHeaders();
>  /**The callback received after a response. This must be called with a 'null' 
> if there was no such header
>  *
>  */
>  void responseHeaderVal(String key, String val);
> }
> {code}
> {code}
> public interface RequestHeaderValueProvider {
>   /**The headers that must be added to the request.
>* @return return null or an empty map if nothing needs to added
>*/
>   Map requestHeaders();
> }
> {code}
> {code:java}
> /**This interface gives urls one by one instead of giving a full list.
>  * This also lets the implementation provide extra details as HTTP headers
>  *
>  */
> public interface UrlProvider extends HeaderInterceptor, 
> RequestHeaderValueProvider {
>   /**Give the base url for the next request to be fired. This is invoked the 
> first time
>* a request needs to be fired, or subsequent after {@link 
> UrlProvider#toRetry()} returns 

[jira] [Updated] (SOLR-14160) Abstract out the replica identification interface for internode requests

2020-01-02 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-14160:
--
Description: The list of replicas to which requests are sent are identified 
in a static fashion. There is no way to cleanly plugin alternate logic of 
identifying the replica list .

> Abstract out the replica identification interface for internode requests
> 
>
> Key: SOLR-14160
> URL: https://issues.apache.org/jira/browse/SOLR-14160
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Priority: Major
>
> The list of replicas to which requests are sent are identified in a static 
> fashion. There is no way to cleanly plugin alternate logic of identifying the 
> replica list .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14160) Abstract out the replica identification interface for internode requests

2020-01-02 Thread Noble Paul (Jira)
Noble Paul created SOLR-14160:
-

 Summary: Abstract out the replica identification interface for 
internode requests
 Key: SOLR-14160
 URL: https://issues.apache.org/jira/browse/SOLR-14160
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Noble Paul






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14158) package manager to read keys from packagestore and not ZK

2020-01-02 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007148#comment-17007148
 ] 

Noble Paul commented on SOLR-14158:
---

Sorry, this was supposed to be an opt-in feature. We are not eliminating the ZK 
option. In fact, this will be an alternate

> package manager to read keys from packagestore and not ZK 
> --
>
> Key: SOLR-14158
> URL: https://issues.apache.org/jira/browse/SOLR-14158
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Labels: packagemanager
>
> The security of the package system relies on securing ZK. It's much easier 
> for users to secure the file system than securing ZK.
> We provide an option to read public keys from file store.  The default 
> behavior will be to read from ZK 
> The nodes must be started with {{-Dpkg.keys=filestore}}
> This will
>  * disable the remote {{PUT /api/cluster/files}} 
>  * The CLI will directly write to the keys to 
> {{/filestore/_trusted_keys/}} dir
>  * The CLI directly writes the package artifacts to the local solr and ask 
> other nodes to fetch from this node. Nobody can upload executable jars over a 
> remote call
>  * Keys stored in ZK will not be used or trusted. So nobody can attack the 
> cluster by publishing a malicious key into Solr



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14158) package manager to read keys from packagestore and not ZK

2020-01-02 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-14158:
--
Description: 
The security of the package system relies on securing ZK. It's much easier for 
users to secure the file system than securing ZK.

We provide an option to read public keys from file store.  The default behavior 
will be to read from ZK 
The nodes must be started with {{-Dpkg.keys=filestore}}

This will
 * disable the remote {{PUT /api/cluster/files}} 
 * The CLI will directly write to the keys to 
{{/filestore/_trusted_keys/}} dir
 * The CLI directly writes the package artifacts to the local solr and ask 
other nodes to fetch from this node. Nobody can upload executable jars over a 
remote call
 * Keys stored in ZK will not be used or trusted. So nobody can attack the 
cluster by publishing a malicious key into Solr

  was:
The security of the package system relies on securing ZK. It's much easier for 
users to secure the file system than securing ZK.

This will 
* disable the remote {{PUT /api/cluster/files}} by default
* The CLI will directly write to the keys to 
{{/filestore/_trusted_keys/}} dir 
* The CLI  directly writes the package artifacts to the local solr and ask 
other nodes to fetch from this node. Nobody can upload executable jars over a 
remote call
* Keys stored in ZK will not be used or trusted. So nobody can attack the 
cluster by publishing a malicious key into Solr


> package manager to read keys from packagestore and not ZK 
> --
>
> Key: SOLR-14158
> URL: https://issues.apache.org/jira/browse/SOLR-14158
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Labels: packagemanager
>
> The security of the package system relies on securing ZK. It's much easier 
> for users to secure the file system than securing ZK.
> We provide an option to read public keys from file store.  The default 
> behavior will be to read from ZK 
> The nodes must be started with {{-Dpkg.keys=filestore}}
> This will
>  * disable the remote {{PUT /api/cluster/files}} 
>  * The CLI will directly write to the keys to 
> {{/filestore/_trusted_keys/}} dir
>  * The CLI directly writes the package artifacts to the local solr and ask 
> other nodes to fetch from this node. Nobody can upload executable jars over a 
> remote call
>  * Keys stored in ZK will not be used or trusted. So nobody can attack the 
> cluster by publishing a malicious key into Solr



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-13486) race condition between leader's "replay on startup" and non-leader's "recover from leader" can leave replicas out of sync (TestCloudConsistency)

2020-01-02 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-13486:
-

Assignee: (was: Erick Erickson)

> race condition between leader's "replay on startup" and non-leader's "recover 
> from leader" can leave replicas out of sync (TestCloudConsistency)
> 
>
> Key: SOLR-13486
> URL: https://issues.apache.org/jira/browse/SOLR-13486
> Project: Solr
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: 
> apache_Lucene-Solr-BadApples-NightlyTests-master_61.log.txt.gz, 
> apache_Lucene-Solr-BadApples-Tests-8.x_102.log.txt.gz, 
> org.apache.solr.cloud.TestCloudConsistency.zip
>
>
> I've been investigating some jenkins failures from TestCloudConsistency, 
> which at first glance suggest a problem w/replica(s) recovering after a 
> network partition from the leader - but in digging into the logs the root 
> cause acturally seems to be a thread race conditions when a replica (the 
> leader) is first registered...
>  * The {{ZkContainer.registerInZk(...)}} method (which is called by 
> {{CoreContainer.registerCore(...)}} & {{CoreContainer.load()}}) is typically 
> run in a background thread (via the {{ZkContainer.coreZkRegister}} 
> ExecutorService)
>  * {{ZkContainer.registerInZk(...)}} delegates to 
> {{ZKController.register(...)}} which is ultimately responsible for checking 
> if there are any "old" tlogs on disk, and if so handling the "Replaying tlog 
> for  during startup" logic
>  * Because this happens in a background thread, other logic/requests can be 
> handled by this core/replica in the meantime - before it starts (or while in 
> the middle of) replaying the tlogs
>  ** Notably: *leader's that have not yet replayed tlogs on startup will 
> erroneously respond to RTG / Fingerprint / PeerSync requests from other 
> replicas w/incomplete data*
> ...In general, it seems scary / fishy to me that a replica can (aparently) 
> become *ACTIVE* before it's finished it's {{registerInZk}} + "Replaying tlog 
> ... during startup" logic ... particularly since this can happen even for 
> replicas that are/become leaders. It seems like this could potentially cause 
> a whole host of problems, only one of which manifests in this particular test 
> failure:
>  * *BEFORE* replicaX's "coreZkRegister" thread reaches the "Replaying tlog 
> ... during startup" check:
>  ** replicaX can recognize (via zk terms) that it should be the leader(X)
>  ** this leaderX can then instruct some other replicaY to recover from it
>  ** replicaY can send RTG / PeerSync / FetchIndex requests to the leaderX 
> (either on it's own volition, or because it was instructed to by leaderX) in 
> an attempt to recover
>  *** the responses to these recovery requests will not include updates in the 
> tlog files that existed on leaderX prior to startup that hvae not yet been 
> replayed
>  * *AFTER* replicaY has finished it's recovery, leaderX's "Replaying tlog ... 
> during startup" can finish
>  ** replicaY now thinks it is in sync with leaderX, but leaderX has 
> (replayed) updates the other replicas know nothing about



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14159) Fix errors in TestCloudConsistency

2020-01-02 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-14159:
-

Assignee: Erick Erickson

> Fix errors in TestCloudConsistency
> --
>
> Key: SOLR-14159
> URL: https://issues.apache.org/jira/browse/SOLR-14159
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> Moving over here from SOLR-13486 as per Hoss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1042: LUCENE-9068: Build FuzzyQuery automata up-front

2020-01-02 Thread GitBox
madrob commented on a change in pull request #1042: LUCENE-9068: Build 
FuzzyQuery automata up-front
URL: https://github.com/apache/lucene-solr/pull/1042#discussion_r362638865
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/FuzzyTermsEnum.java
 ##
 @@ -58,30 +60,42 @@
   // which we use to know when we can reduce the automaton from ed=2 to ed=1, 
or ed=0 if only single top term is collected:
   private final MaxNonCompetitiveBoostAttribute maxBoostAtt;
 
-  // We use this to share the pre-built (once for the query) Levenshtein 
automata across segments:
-  private final LevenshteinAutomataAttribute dfaAtt;
+  private final CompiledAutomaton[] automata;
   
   private float bottom;
   private BytesRef bottomTerm;
-  private final CompiledAutomaton automata[];
 
   private BytesRef queuedBottom;
 
-  final int termLength;
+  private final int termLength;
 
   // Maximum number of edits we will accept.  This is either 2 or 1 (or, 
degenerately, 0) passed by the user originally,
   // but as we collect terms, we can lower this (e.g. from 2 to 1) if we 
detect that the term queue is full, and all
   // collected terms are ed=1:
   private int maxEdits;
 
-  final Terms terms;
-  final Term term;
-  final int termText[];
-  final int realPrefixLength;
+  private final Terms terms;
+  private final Term term;
+
+  /**
+   * Constructor for enumeration of all terms from specified 
reader which share a prefix of
+   * length prefixLength with term and which have at 
most {@code maxEdits} edits.
+   * 
+   * After calling the constructor the enumeration is already pointing to the 
first
+   * valid term if such a term exists.
+   *
+   * @param terms Delivers terms.
+   * @param term Pattern term.
+   * @param maxEdits Maximum edit distance.
+   * @param prefixLength the length of the required common prefix
+   * @param transitions whether transitions should count as a single edit
 
 Review comment:
   nit: why was this renamed from transpositions?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14158) package manager to read keys from packagestore and not ZK

2020-01-02 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007103#comment-17007103
 ] 

David Smiley commented on SOLR-14158:
-

Ideally abstractions are in place that allow both.  I'm not sure we should be 
forcing people to use the File Store _yet_.  It's very new.

> package manager to read keys from packagestore and not ZK 
> --
>
> Key: SOLR-14158
> URL: https://issues.apache.org/jira/browse/SOLR-14158
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Labels: packagemanager
>
> The security of the package system relies on securing ZK. It's much easier 
> for users to secure the file system than securing ZK.
> This will 
> * disable the remote {{PUT /api/cluster/files}} by default
> * The CLI will directly write to the keys to 
> {{/filestore/_trusted_keys/}} dir 
> * The CLI  directly writes the package artifacts to the local solr and ask 
> other nodes to fetch from this node. Nobody can upload executable jars over a 
> remote call
> * Keys stored in ZK will not be used or trusted. So nobody can attack the 
> cluster by publishing a malicious key into Solr



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1042: LUCENE-9068: Build FuzzyQuery automata up-front

2020-01-02 Thread GitBox
madrob commented on a change in pull request #1042: LUCENE-9068: Build 
FuzzyQuery automata up-front
URL: https://github.com/apache/lucene-solr/pull/1042#discussion_r362639329
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/FuzzyTermsEnum.java
 ##
 @@ -344,67 +363,4 @@ public BytesRef term() throws IOException {
 return actualEnum.term();
   }
 
-  /**
-   * reuses compiled automata across different segments,
-   * because they are independent of the index
-   * @lucene.internal */
-  public static interface LevenshteinAutomataAttribute extends Attribute {
-public CompiledAutomaton[] automata();
-public void setAutomata(CompiledAutomaton[] automata);
-  }
-
-  /** 
-   * Stores compiled automata as a list (indexed by edit distance)
-   * @lucene.internal */
-  public static final class LevenshteinAutomataAttributeImpl extends 
AttributeImpl implements LevenshteinAutomataAttribute {
-private CompiledAutomaton[] automata;
-  
-@Override
-public CompiledAutomaton[] automata() {
-  return automata;
-}
-
-@Override
-public void setAutomata(CompiledAutomaton[] automata) {
-  this.automata = automata;
-}
-
-@Override
-public void clear() {
-  automata = null;
-}
-
-@Override
-public int hashCode() {
-  if (automata == null) {
-return 0;
-  } else {
-return automata.hashCode();
-  }
-}
-
-@Override
-public boolean equals(Object other) {
-  if (this == other)
-return true;
-  if (!(other instanceof LevenshteinAutomataAttributeImpl))
-return false;
-  return Arrays.equals(automata, ((LevenshteinAutomataAttributeImpl) 
other).automata);
 
 Review comment:
   nit: we have some unused imports after this deletion.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14154) Return correct isolation level when retrieving it from the SQL Connection

2020-01-02 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007096#comment-17007096
 ] 

Kevin Risden commented on SOLR-14154:
-

{quote}Do I make PR's for each version I want this to be backported to?{quote}

No don't worry about it. I'll backport the change to the applicable branches. 

{code:java}
I'm trying to write a Solr driver for [Metabase|https://www.metabase.com/] and 
the JDBC route seemed the way to go for me. Unfortunately metabase creates a 
pooled connection wich uses the getTransactionIsolation() method and crashes
{code}

Ah neat. Looks like there isn't a check before getting the isolation level. 

https://github.com/swaldman/c3p0/blob/master/src/java/com/mchange/v2/c3p0/impl/NewPooledConnection.java#L120

{quote}created the pull requests.{quote}

Thanks I'll try to get this in today or tomorrow.

> Return correct isolation level when retrieving it from the SQL Connection
> -
>
> Key: SOLR-14154
> URL: https://issues.apache.org/jira/browse/SOLR-14154
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Parallel SQL
>Affects Versions: 8.4
>Reporter: Nick Vercammen
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 8.5, 8.4.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When calling the getTransactionIsolation() on the Sql.ConnectionImpl an 
> UnsupportedException is thrown. It would be better to return TRANSACTION_NONE 
> so clients can determine themselves it is not supported without receiving an 
> exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14154) Return correct isolation level when retrieving it from the SQL Connection

2020-01-02 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007096#comment-17007096
 ] 

Kevin Risden edited comment on SOLR-14154 at 1/2/20 9:42 PM:
-

{quote}Do I make PR's for each version I want this to be backported to?{quote}

No don't worry about it. I'll backport the change to the applicable branches. 

{quote}
I'm trying to write a Solr driver for [Metabase|https://www.metabase.com/] and 
the JDBC route seemed the way to go for me. Unfortunately metabase creates a 
pooled connection wich uses the getTransactionIsolation() method and crashes
{quote}

Ah neat. Looks like there isn't a check before getting the isolation level. 

https://github.com/swaldman/c3p0/blob/master/src/java/com/mchange/v2/c3p0/impl/NewPooledConnection.java#L120

{quote}created the pull requests.{quote}

Thanks I'll try to get this in today or tomorrow.


was (Author: risdenk):
{quote}Do I make PR's for each version I want this to be backported to?{quote}

No don't worry about it. I'll backport the change to the applicable branches. 

{code:java}
I'm trying to write a Solr driver for [Metabase|https://www.metabase.com/] and 
the JDBC route seemed the way to go for me. Unfortunately metabase creates a 
pooled connection wich uses the getTransactionIsolation() method and crashes
{code}

Ah neat. Looks like there isn't a check before getting the isolation level. 

https://github.com/swaldman/c3p0/blob/master/src/java/com/mchange/v2/c3p0/impl/NewPooledConnection.java#L120

{quote}created the pull requests.{quote}

Thanks I'll try to get this in today or tomorrow.

> Return correct isolation level when retrieving it from the SQL Connection
> -
>
> Key: SOLR-14154
> URL: https://issues.apache.org/jira/browse/SOLR-14154
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Parallel SQL
>Affects Versions: 8.4
>Reporter: Nick Vercammen
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 8.5, 8.4.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When calling the getTransactionIsolation() on the Sql.ConnectionImpl an 
> UnsupportedException is thrown. It would be better to return TRANSACTION_NONE 
> so clients can determine themselves it is not supported without receiving an 
> exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14154) Return correct isolation level when retrieving it from the SQL Connection

2020-01-02 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14154:

Status: Patch Available  (was: Open)

> Return correct isolation level when retrieving it from the SQL Connection
> -
>
> Key: SOLR-14154
> URL: https://issues.apache.org/jira/browse/SOLR-14154
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Parallel SQL
>Affects Versions: 8.4
>Reporter: Nick Vercammen
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 8.5, 8.4.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When calling the getTransactionIsolation() on the Sql.ConnectionImpl an 
> UnsupportedException is thrown. It would be better to return TRANSACTION_NONE 
> so clients can determine themselves it is not supported without receiving an 
> exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on a change in pull request #1033: SOLR-13965: Use Plugin to add new expressions to GraphHandler

2020-01-02 Thread GitBox
epugh commented on a change in pull request #1033: SOLR-13965: Use Plugin to 
add new expressions to GraphHandler
URL: https://github.com/apache/lucene-solr/pull/1033#discussion_r362638584
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/handler/GraphHandler.java
 ##
 @@ -92,24 +104,29 @@ public void inform(SolrCore core) {
 }
 
 // This pulls all the overrides and additions from the config
+List pluginInfos = 
core.getSolrConfig().getPluginInfos(Expressible.class.getName());
+
+// Check deprecated approach.
 Object functionMappingsObj = initArgs.get("streamFunctions");
 if(null != functionMappingsObj){
+  log.warn("solrconfig.xml:  is deprecated for adding 
additional streaming functions to GraphHandler.");
   NamedList functionMappings = (NamedList)functionMappingsObj;
   for(Entry functionMapping : functionMappings) {
 String key = functionMapping.getKey();
 PluginInfo pluginInfo = new PluginInfo(key, 
Collections.singletonMap("class", functionMapping.getValue()));
-
-if (pluginInfo.pkgName == null) {
-  Class clazz = 
core.getResourceLoader().findClass((String) functionMapping.getValue(),
-  Expressible.class);
-  streamFactory.withFunctionName(key, clazz);
-} else {
-  StreamHandler.ExpressibleHolder holder = new 
StreamHandler.ExpressibleHolder(pluginInfo, core, 
SolrConfig.classVsSolrPluginInfo.get(Expressible.class));
-  streamFactory.withFunctionName(key, () -> holder.getClazz());
-}
-
+pluginInfos.add(pluginInfo);
   }
+}
 
+for (PluginInfo pluginInfo : pluginInfos) {
+  if (pluginInfo.pkgName != null) {
+ExpressibleHolder holder = new ExpressibleHolder(pluginInfo, core, 
SolrConfig.classVsSolrPluginInfo.get(Expressible.class));
 
 Review comment:
   @madrob can you push up a fix to this PR?   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14154) Return correct isolation level when retrieving it from the SQL Connection

2020-01-02 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden reassigned SOLR-14154:
---

Assignee: Kevin Risden

> Return correct isolation level when retrieving it from the SQL Connection
> -
>
> Key: SOLR-14154
> URL: https://issues.apache.org/jira/browse/SOLR-14154
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Parallel SQL
>Affects Versions: 8.4
>Reporter: Nick Vercammen
>Assignee: Kevin Risden
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When calling the getTransactionIsolation() on the Sql.ConnectionImpl an 
> UnsupportedException is thrown. It would be better to return TRANSACTION_NONE 
> so clients can determine themselves it is not supported without receiving an 
> exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk closed pull request #1134: SOLR-14154 Return correct isolation level when retrieving it from the SQL Connection

2020-01-02 Thread GitBox
risdenk closed pull request #1134: SOLR-14154 Return correct isolation level 
when retrieving it from the SQL Connection
URL: https://github.com/apache/lucene-solr/pull/1134
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14154) Return correct isolation level when retrieving it from the SQL Connection

2020-01-02 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14154:

Fix Version/s: 8.4.1
   8.5

> Return correct isolation level when retrieving it from the SQL Connection
> -
>
> Key: SOLR-14154
> URL: https://issues.apache.org/jira/browse/SOLR-14154
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Parallel SQL
>Affects Versions: 8.4
>Reporter: Nick Vercammen
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 8.5, 8.4.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When calling the getTransactionIsolation() on the Sql.ConnectionImpl an 
> UnsupportedException is thrown. It would be better to return TRANSACTION_NONE 
> so clients can determine themselves it is not supported without receiving an 
> exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14154) Return correct isolation level when retrieving it from the SQL Connection

2020-01-02 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007092#comment-17007092
 ] 

Kevin Risden commented on SOLR-14154:
-

There probably won't be another 8.3.x release. Solr 8.4 was just released. 
There is an 8.4.1 release going out shortly. So this will target Solr 9.0, 8.5 
and 8.4.1.

> Return correct isolation level when retrieving it from the SQL Connection
> -
>
> Key: SOLR-14154
> URL: https://issues.apache.org/jira/browse/SOLR-14154
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Parallel SQL
>Affects Versions: 8.4
>Reporter: Nick Vercammen
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 8.5, 8.4.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When calling the getTransactionIsolation() on the Sql.ConnectionImpl an 
> UnsupportedException is thrown. It would be better to return TRANSACTION_NONE 
> so clients can determine themselves it is not supported without receiving an 
> exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk commented on issue #1134: SOLR-14154 Return correct isolation level when retrieving it from the SQL Connection

2020-01-02 Thread GitBox
risdenk commented on issue #1134: SOLR-14154 Return correct isolation level 
when retrieving it from the SQL Connection
URL: https://github.com/apache/lucene-solr/pull/1134#issuecomment-570358843
 
 
   I'll backport to the appropriate branches from PR #1135


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-02 Thread Houston Putman (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007084#comment-17007084
 ] 

Houston Putman commented on SOLR-11746:
---

I agree [~hossman]. If we are going to default to the {{getRangeQuery()}} for 
every possible entrypoint of a field wildcard, then the docValues check should 
be done in that method.

Also every {{getRangeQuery()}} implementation should support this new 
optimization. So in order to help facilitate that, I refactored all overriding 
implementations of {{getRangeQuery()}} to {{getSpecializedRangeQuery()}}, which 
is called from {{FieldType.getRangeQuery()}} which does the check for an 
existence Queries. Any pieces of custom code that add new fields (and override 
{{getRangeQuery()}}) will not be broken by this, they will just not receive the 
new optimization.

I would do the same thing for {{getPrefixQuery}}, but there are no other 
overriding methods than the one included in this patch.

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-11746) numeric fields need better error handling for prefix/wildcard syntax -- consider uniform support for "foo:* == foo:[* TO *]"

2020-01-02 Thread Houston Putman (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman updated SOLR-11746:
--
Attachment: SOLR-11746.patch

> numeric fields need better error handling for prefix/wildcard syntax -- 
> consider uniform support for "foo:* == foo:[* TO *]"
> 
>
> Key: SOLR-11746
> URL: https://issues.apache.org/jira/browse/SOLR-11746
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-11746.patch, SOLR-11746.patch, SOLR-11746.patch
>
>
> On the solr-user mailing list, Torsten Krah pointed out that with Trie 
> numeric fields, query syntax such as {{foo_d:\*}} has been functionality 
> equivilent to {{foo_d:\[\* TO \*]}} and asked why this was not also supported 
> for Point based numeric fields.
> The fact that this type of syntax works (for {{indexed="true"}} Trie fields) 
> appears to have been an (untested, undocumented) fluke of Trie fields given 
> that they use indexed terms for the (encoded) numeric terms and inherit the 
> default implementation of {{FieldType.getPrefixQuery}} which produces a 
> prefix query against the {{""}} (empty string) term.  
> (Note that this syntax has aparently _*never*_ worked for Trie fields with 
> {{indexed="false" docValues="true"}} )
> In general, we should assess the behavior users attempt a prefix/wildcard 
> syntax query against numeric fields, as currently the behavior is largely 
> non-sensical:  prefix/wildcard syntax frequently match no docs w/o any sort 
> of error, and the aformentioned {{numeric_field:*}} behaves inconsistently 
> between points/trie fields and between indexed/docValued trie fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-02 Thread Jason Gerlowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13890:
---
Attachment: (was: Screen Shot 2020-01-02 at 3.55.43 PM.png)

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-02 Thread Jason Gerlowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13890:
---
Attachment: Screen Shot 2020-01-02 at 3.55.43 PM.png

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, Screen Shot 
> 2020-01-02 at 3.55.43 PM.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1033: SOLR-13965: Use Plugin to add new expressions to GraphHandler

2020-01-02 Thread GitBox
madrob commented on a change in pull request #1033: SOLR-13965: Use Plugin to 
add new expressions to GraphHandler
URL: https://github.com/apache/lucene-solr/pull/1033#discussion_r362626064
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/handler/GraphHandler.java
 ##
 @@ -92,24 +104,29 @@ public void inform(SolrCore core) {
 }
 
 // This pulls all the overrides and additions from the config
+List pluginInfos = 
core.getSolrConfig().getPluginInfos(Expressible.class.getName());
+
+// Check deprecated approach.
 Object functionMappingsObj = initArgs.get("streamFunctions");
 if(null != functionMappingsObj){
+  log.warn("solrconfig.xml:  is deprecated for adding 
additional streaming functions to GraphHandler.");
   NamedList functionMappings = (NamedList)functionMappingsObj;
   for(Entry functionMapping : functionMappings) {
 String key = functionMapping.getKey();
 PluginInfo pluginInfo = new PluginInfo(key, 
Collections.singletonMap("class", functionMapping.getValue()));
-
-if (pluginInfo.pkgName == null) {
-  Class clazz = 
core.getResourceLoader().findClass((String) functionMapping.getValue(),
-  Expressible.class);
-  streamFactory.withFunctionName(key, clazz);
-} else {
-  StreamHandler.ExpressibleHolder holder = new 
StreamHandler.ExpressibleHolder(pluginInfo, core, 
SolrConfig.classVsSolrPluginInfo.get(Expressible.class));
-  streamFactory.withFunctionName(key, () -> holder.getClazz());
-}
-
+pluginInfos.add(pluginInfo);
   }
+}
 
+for (PluginInfo pluginInfo : pluginInfos) {
+  if (pluginInfo.pkgName != null) {
+ExpressibleHolder holder = new ExpressibleHolder(pluginInfo, core, 
SolrConfig.classVsSolrPluginInfo.get(Expressible.class));
 
 Review comment:
   This line is likely hiding a bug (and also a bug in StreamHandler) because 
`SolrConfig.classVsSolrPluginInfo.get(Expressible.class)` should always return 
null due to type mismatch on the map. Need a String key, but we also don't have 
any tests to prove this out. If you don't want to address that in this issue, 
feel free to file a new JIRA and tag me on it. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12490) Introducing json.queries WAS:Query DSL supports for further referring and exclusion in JSON facets

2020-01-02 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007078#comment-17007078
 ] 

Mikhail Khludnev commented on SOLR-12490:
-

Attached what I'm going to commit this week. Ref Guide and JSON to be continued 
in SOLR-14156

> Introducing json.queries WAS:Query DSL supports for further referring and 
> exclusion in JSON facets 
> ---
>
> Key: SOLR-12490
> URL: https://issues.apache.org/jira/browse/SOLR-12490
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module, faceting
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
>  Labels: newdev
> Attachments: SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch, 
> SOLR-12490.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It's spin off from the 
> [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720].
>  
> h2. Problem
> # after SOLR-9685 we can tag separate clauses in hairish queries like 
> {{parent}}, {{bool}}
> # we can {{domain.excludeTags}}
> # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 
>
> # but we can refer only separate params in {{domain.filter}}, it's not 
> possible to refer separate clauses
> see the first comment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1033: SOLR-13965: Use Plugin to add new expressions to GraphHandler

2020-01-02 Thread GitBox
madrob commented on a change in pull request #1033: SOLR-13965: Use Plugin to 
add new expressions to GraphHandler
URL: https://github.com/apache/lucene-solr/pull/1033#discussion_r362624971
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/handler/GraphHandler.java
 ##
 @@ -92,24 +104,29 @@ public void inform(SolrCore core) {
 }
 
 // This pulls all the overrides and additions from the config
+List pluginInfos = 
core.getSolrConfig().getPluginInfos(Expressible.class.getName());
+
+// Check deprecated approach.
 Object functionMappingsObj = initArgs.get("streamFunctions");
 if(null != functionMappingsObj){
+  log.warn("solrconfig.xml:  is deprecated for adding 
additional streaming functions to GraphHandler.");
   NamedList functionMappings = (NamedList)functionMappingsObj;
   for(Entry functionMapping : functionMappings) {
 String key = functionMapping.getKey();
 PluginInfo pluginInfo = new PluginInfo(key, 
Collections.singletonMap("class", functionMapping.getValue()));
-
-if (pluginInfo.pkgName == null) {
-  Class clazz = 
core.getResourceLoader().findClass((String) functionMapping.getValue(),
-  Expressible.class);
-  streamFactory.withFunctionName(key, clazz);
-} else {
-  StreamHandler.ExpressibleHolder holder = new 
StreamHandler.ExpressibleHolder(pluginInfo, core, 
SolrConfig.classVsSolrPluginInfo.get(Expressible.class));
-  streamFactory.withFunctionName(key, () -> holder.getClazz());
-}
-
+pluginInfos.add(pluginInfo);
   }
+}
 
+for (PluginInfo pluginInfo : pluginInfos) {
+  if (pluginInfo.pkgName != null) {
+ExpressibleHolder holder = new ExpressibleHolder(pluginInfo, core, 
SolrConfig.classVsSolrPluginInfo.get(Expressible.class));
+streamFactory.withFunctionName(pluginInfo.name,
+() -> holder.getClazz());
+  } else {
+Class clazz = 
core.getMemClassLoader().findClass(pluginInfo.className, Expressible.class);
 
 Review comment:
   Since this code is duplicated between Stream & Graph, can we factor it out 
into a common method?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-12490) Introducing json.queries WAS:Query DSL supports for further referring and exclusion in JSON facets

2020-01-02 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-12490:

Attachment: SOLR-12490.patch
Status: Patch Available  (was: Patch Available)

> Introducing json.queries WAS:Query DSL supports for further referring and 
> exclusion in JSON facets 
> ---
>
> Key: SOLR-12490
> URL: https://issues.apache.org/jira/browse/SOLR-12490
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module, faceting
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
>  Labels: newdev
> Attachments: SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch, 
> SOLR-12490.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It's spin off from the 
> [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720].
>  
> h2. Problem
> # after SOLR-9685 we can tag separate clauses in hairish queries like 
> {{parent}}, {{bool}}
> # we can {{domain.excludeTags}}
> # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 
>
> # but we can refer only separate params in {{domain.filter}}, it's not 
> possible to refer separate clauses
> see the first comment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-12490) Introducing json.queries WAS:Query DSL supports for further referring and exclusion in JSON facets

2020-01-02 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-12490:

Summary: Introducing json.queries WAS:Query DSL supports for further 
referring and exclusion in JSON facets   (was: Query DSL supports for further 
referring and exclusion in JSON facets )

> Introducing json.queries WAS:Query DSL supports for further referring and 
> exclusion in JSON facets 
> ---
>
> Key: SOLR-12490
> URL: https://issues.apache.org/jira/browse/SOLR-12490
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module, faceting
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
>  Labels: newdev
> Attachments: SOLR-12490.patch, SOLR-12490.patch, SOLR-12490.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It's spin off from the 
> [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720].
>  
> h2. Problem
> # after SOLR-9685 we can tag separate clauses in hairish queries like 
> {{parent}}, {{bool}}
> # we can {{domain.excludeTags}}
> # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 
>
> # but we can refer only separate params in {{domain.filter}}, it's not 
> possible to refer separate clauses
> see the first comment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13817) Deprecate and remove legacy SolrCache implementations

2020-01-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-13817.
-
Resolution: Fixed

> Deprecate and remove legacy SolrCache implementations
> -
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-13817-8x.patch, SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13817) Deprecate and remove legacy SolrCache implementations

2020-01-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007060#comment-17007060
 ] 

ASF subversion and git services commented on SOLR-13817:


Commit 7d0cf0df3286dba2354fc854a64eac5dcc09961a in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7d0cf0d ]

SOLR-13817: Clean up config files to remove the default 'class=' attribute in
standard caches.


> Deprecate and remove legacy SolrCache implementations
> -
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-13817-8x.patch, SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-02 Thread Jason Gerlowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13890:
---
Attachment: Screen Shot 2020-01-02 at 2.25.12 PM.png

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-02 Thread Jason Gerlowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-13890:
---
Attachment: SOLR-13890.patch

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14158) package manager to read keys from packagestore and not ZK

2020-01-02 Thread Noble Paul (Jira)
Noble Paul created SOLR-14158:
-

 Summary: package manager to read keys from packagestore and not ZK 
 Key: SOLR-14158
 URL: https://issues.apache.org/jira/browse/SOLR-14158
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: packages
Reporter: Noble Paul
Assignee: Noble Paul


The security of the package system relies on securing ZK. It's much easier for 
users to secure the file system than securing ZK.

This will 
* disable the remote {{PUT /api/cluster/files}} by default
* The CLI will directly write to the keys to 
{{/filestore/_trusted_keys/}} dir 
* The CLI  directly writes the package artifacts to the local solr and ask 
other nodes to fetch from this node. Nobody can upload executable jars over a 
remote call
* Keys stored in ZK will not be used or trusted. So nobody can attack the 
cluster by publishing a malicious key into Solr



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13486) race condition between leader's "replay on startup" and non-leader's "recover from leader" can leave replicas out of sync (TestCloudConsistency)

2020-01-02 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007007#comment-17007007
 ] 

Chris M. Hostetter commented on SOLR-13486:
---

Ok, I'm playing catchup on all the comments from the last 2 weeks – most of 
which are confusing hte hell out of me – so forgive me if this is all over the 
place...

This Jira is *NOT* a generic tracker for "TestCloudConsistency can sometimes 
fail" ... it is tracking a very specific bug, that can impact real world solr 
instances in the wild.

The fact that it was discovered via TestCloudConsistency, and that that test 
appears to be (AFAICT) the only test we currently have in the code base for 
reproducing this _code_ bug does not mean we should use this jira to discuss 
every possible failure people encounter w/TestCloudConsistency. Please pay 
attention to the specifics of the failures you encounter, and when in doubt 
file a new jira (they can always be linked later, but confusing off topic 
comments are hard to de-tangle from a jira after the fact)

[~erickerickson] : if you are seeing NPEs in your logs when you run 
TestCloudConsistency then that most certainly has nothing to do with the this 
specific bug being tracked here.

(BUt it's impossible to be certain since you dind't attach any logs or 
specifics)

Please open a new jira and move the disucssion of this unrelated problem you 
have found to that jira. (or open multiple new jiras if you think you've found 
multiple new problems)

{quote}The cluster setup and teardown is done for each test 'cause they're 
annotated with `@Before` and `@After`. Changing these to `@BeforeClass` and 
`@AfterClass`, which at least lessens the confusion. I don't think this is a 
real fix given the comments from Chris and Dat, so I'll see if I can still 
generate errors with this change.
{quote}
Please *DO NOT* change this test to setup/teardown the cluster in 
{{@BeforeClass}} and {{@AfterClass}} – the fact that this test (and others like 
it) uses a "pristine" cluster for each test method is very specific – because 
the test is mucking with the collection nodes, and this way a failure from one 
test (that might leve the cluster in a bad state) won't "bleed over" into 
another test and cause spurrious failures.
{quote}What Hoss saw may be an artifact of the fact that the cluster was being 
created/destroyed between tests. So far when I only run a single test at a time 
I'm not seeing failures, but that's not very conclusive at this point.
{quote}
The bug being tracked here has *NOTHING* to do with a new cluster being 
recreated for each test. please note all of the logs & analysis i've already 
posted – everything about this bug has to do with when/how the leader is 
partitioned & shutdown _during_ the test, and the race condition thta exists as 
a result when nodes are trying to recover from that leader while it is trying 
to recover from it's tlog.

Dawid's {{org.apache.solr.cloud.TestCloudConsistency.zip}} attachment includes 
a log that does in fact show the specific problem tracked in this isssue – note 
the time stamps of the log messages in the last 2 grep comments...
{noformat}
# here's the replica that is failing to locate doc#4...

hossman@slate:~/tmp$ grep outOfSyncReplicasCannotBecomeLeader-false 
org.apache.solr.cloud.TestCloudConsistency.html | grep 'Doc with id=4 not found'
java.lang.AssertionError: Doc with id=4 not found in 
http://127.0.0.1:40399/solr/outOfSyncReplicasCannotBecomeLeader-false due to: 
Path not found: /id; rsp={doc=null}

# here's the recovery via replication logging from that replica showing who the 
leader is...

hossman@slate:~/tmp$ grep outOfSyncReplicasCannotBecomeLeader-false 
org.apache.solr.cloud.TestCloudConsistency.html | grep 'n:127.0.0.1:40399_solr' 
| grep recoveryExecutor | grep 'Attempting to replicate from'
1500569 INFO  
(recoveryExecutor-10537-thread-1-processing-n:127.0.0.1:40399_solr 
x:outOfSyncReplicasCannotBecomeLeader-false_shard1_replica_n3 
c:outOfSyncReplicasCannotBecomeLeader-false s:shard1 r:core_node4) 
[n:127.0.0.1:40399_solr c:outOfSyncReplicasCannotBecomeLeader-false s:shard1 
r:core_node4 x:outOfSyncReplicasCannotBecomeLeader-false_shard1_replica_n3 ] 
o.a.s.c.RecoveryStrategy Attempting to replicate from 
[http://127.0.0.1:33461/solr/outOfSyncReplicasCannotBecomeLeader-false_shard1_replica_n1/].

# here's when that leader is doing it's tlog replay...

hossman@slate:~/tmp$ grep outOfSyncReplicasCannotBecomeLeader-false 
org.apache.solr.cloud.TestCloudConsistency.html | grep 'n:127.0.0.1:33461_solr' 
| grep 'Replaying tlog'
1515376 INFO  (coreZkRegister-10093-thread-1-processing-n:127.0.0.1:33461_solr 
x:outOfSyncReplicasCannotBecomeLeader-false_shard1_replica_n1 
c:outOfSyncReplicasCannotBecomeLeader-false s:shard1 r:core_node2) 
[n:127.0.0.1:33461_solr c:outOfSyncReplicasCannotBecomeLeader-false s:shard1 
r:core_node2 

[GitHub] [lucene-solr] madrob closed pull request #1091: LUCENE-9098 Report bad term for fuzzy query, SOLR-13190 Surface Fuzzy term errors in Solr

2020-01-02 Thread GitBox
madrob closed pull request #1091: LUCENE-9098 Report bad term for fuzzy query, 
SOLR-13190 Surface Fuzzy term errors in Solr
URL: https://github.com/apache/lucene-solr/pull/1091
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14130) Add postlogs command line tool for indexing Solr logs

2020-01-02 Thread Joel Bernstein (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-14130:
--
Attachment: SOLR-14130.patch

> Add postlogs command line tool for indexing Solr logs
> -
>
> Key: SOLR-14130
> URL: https://issues.apache.org/jira/browse/SOLR-14130
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> Screen Shot 2019-12-19 at 2.04.41 PM.png, Screen Shot 2019-12-19 at 2.16.01 
> PM.png, Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 2019-12-21 at 
> 8.46.51 AM.png
>
>
> This ticket adds a simple command line tool for posting Solr logs to a solr 
> index. The tool works with the out of the box Solr log format. Still a work 
> in progress but currently indexes:
>  * queries
>  * updates
>  * commits
>  * new searchers
>  * errors - including stack traces
> Attached are some sample visualizations using Solr Streaming Expressions and 
> Math Expressions after the data has been loaded. The visualizations show: 
> time series, scatter plots, histograms and quantile plots, but really this is 
> just scratching the surface of the visualizations that can be done with the 
> Solr logs.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] tflobbe edited a comment on issue #1116: SOLR-14135: Utils.toJavabin returns a byte[] instead of InputStream

2020-01-02 Thread GitBox
tflobbe edited a comment on issue #1116: SOLR-14135: Utils.toJavabin returns a 
byte[] instead of InputStream
URL: https://github.com/apache/lucene-solr/pull/1116#issuecomment-570292700
 
 
   Yes, the thing is that this method is not being used anywhere else at this 
time, so the `InputStream` one would be unused.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] tflobbe commented on issue #1116: SOLR-14135: Utils.toJavabin returns a byte[] instead of InputStream

2020-01-02 Thread GitBox
tflobbe commented on issue #1116: SOLR-14135: Utils.toJavabin returns a byte[] 
instead of InputStream
URL: https://github.com/apache/lucene-solr/pull/1116#issuecomment-570292700
 
 
   Yes, the thing is that this method is not being used anywhere else at this 
time


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on issue #927: LUCENE-8997: : Add type of triangle info to ShapeField encoding

2020-01-02 Thread GitBox
jpountz commented on issue #927: LUCENE-8997: : Add type of triangle info to 
ShapeField encoding
URL: https://github.com/apache/lucene-solr/pull/927#issuecomment-570276927
 
 
   @iverase Can you expand on why this would be an issue for backward 
compatibility?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #927: LUCENE-8997: : Add type of triangle info to ShapeField encoding

2020-01-02 Thread GitBox
jpountz commented on a change in pull request #927: LUCENE-8997: : Add type of 
triangle info to ShapeField encoding
URL: https://github.com/apache/lucene-solr/pull/927#discussion_r362555275
 
 

 ##
 File path: lucene/sandbox/src/java/org/apache/lucene/document/ShapeField.java
 ##
 @@ -101,21 +143,95 @@ protected void setTriangleValue(int aX, int aY, boolean 
abFromShape, int bX, int
   private static final int MAXY_MINX_MINY_X_Y_MAXX = 6;
   private static final int MINY_MINX_Y_MAXX_MAXY_X = 7;
 
 Review comment:
   Indeed this change needs to be backward compatible, but is it really an 
issue here? My understanding is that with old segments you would get points 
reported as triangles, but in practice that wouldn't be an issue? And new 
indices with this change should never get queried with old versions of Lucene 
that don't know about these constants (Lucene doesn't guarantee any forward 
compatibility) so that wouldn't be an issue either?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #927: LUCENE-8997: : Add type of triangle info to ShapeField encoding

2020-01-02 Thread GitBox
jpountz commented on a change in pull request #927: LUCENE-8997: : Add type of 
triangle info to ShapeField encoding
URL: https://github.com/apache/lucene-solr/pull/927#discussion_r362555275
 
 

 ##
 File path: lucene/sandbox/src/java/org/apache/lucene/document/ShapeField.java
 ##
 @@ -101,21 +143,95 @@ protected void setTriangleValue(int aX, int aY, boolean 
abFromShape, int bX, int
   private static final int MAXY_MINX_MINY_X_Y_MAXX = 6;
   private static final int MINY_MINX_Y_MAXX_MAXY_X = 7;
 
 Review comment:
   Indeed this change needs to be backward compatible, but is it really an 
issue here? My understanding is that with all segments you would get points 
reported as triangles, but in practice that wouldn't be an issue? And new 
indices with this change should never get queried with old versions of Lucene 
that don't know about these constants (Lucene doesn't guarantee any forward 
compatibility) so that wouldn't be an issue either?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
mikemccand commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362543015
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java
 ##
 @@ -75,6 +75,17 @@
   /** Total number of bytes used to store term suffixes. */
   public long totalBlockSuffixBytes;
 
+  /**
+   * Number of times each compression method has been used.
+   * 0 = uncompressed
+   * 1 = lowercase_ascii
+   * 2 = LZ4
+   */
+  public final long[] compressionAlgorithms = new long[3];
 
 Review comment:
   Cool that you track this in BlockTree stats!  Did you post the stats 
somewhere?  Edit: ahh, I see the [cool stats 
here](https://issues.apache.org/jira/browse/LUCENE-4702?focusedCommentId=17003640=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17003640),
 thanks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
mikemccand commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362545794
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/compress/LZ4.java
 ##
 @@ -0,0 +1,397 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util.compress;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Objects;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.store.DataOutput;
+import org.apache.lucene.util.packed.PackedInts;
+
+/**
+ * LZ4 compression and decompression routines.
+ *
+ * http://code.google.com/p/lz4/
+ * http://fastcompression.blogspot.fr/p/lz4.html
 
 Review comment:
   Are these also Apache 2.0 licensed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4702) Terms dictionary compression

2020-01-02 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006942#comment-17006942
 ] 

Michael McCandless commented on LUCENE-4702:


These are great results!  Hmm, how come there is no {{PKLookup}} task in your 
results?  It's a separate boolean option to the {{Competition}} I think, maybe 
your {{perf.py}} passed {{False}}?  I would think {{PKLookup}} might be 
impacted non-trivially ... but maybe that tradeoff is OK.

> Terms dictionary compression
> 
>
> Key: LUCENE-4702
> URL: https://issues.apache.org/jira/browse/LUCENE-4702
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Trivial
> Attachments: LUCENE-4702.patch, LUCENE-4702.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> I've done a quick test with the block tree terms dictionary by replacing a 
> call to IndexOutput.writeBytes to write suffix bytes with a call to 
> LZ4.compressHC to test the peformance hit. Interestingly, search performance 
> was very good (see comparison table below) and the tim files were 14% smaller 
> (from 150432 bytes overall to 129516).
> {noformat}
> TaskQPS baseline  StdDevQPS compressed  StdDev
> Pct diff
>   Fuzzy1  111.50  (2.0%)   78.78  (1.5%)  
> -29.4% ( -32% -  -26%)
>   Fuzzy2   36.99  (2.7%)   28.59  (1.5%)  
> -22.7% ( -26% -  -18%)
>  Respell  122.86  (2.1%)  103.89  (1.7%)  
> -15.4% ( -18% -  -11%)
> Wildcard  100.58  (4.3%)   94.42  (3.2%)   
> -6.1% ( -13% -1%)
>  Prefix3  124.90  (5.7%)  122.67  (4.7%)   
> -1.8% ( -11% -9%)
>OrHighLow  169.87  (6.8%)  167.77  (8.0%)   
> -1.2% ( -15% -   14%)
>  LowTerm 1949.85  (4.5%) 1929.02  (3.4%)   
> -1.1% (  -8% -7%)
>   AndHighLow 2011.95  (3.5%) 1991.85  (3.3%)   
> -1.0% (  -7% -5%)
>   OrHighHigh  155.63  (6.7%)  154.12  (7.9%)   
> -1.0% ( -14% -   14%)
>  AndHighHigh  341.82  (1.2%)  339.49  (1.7%)   
> -0.7% (  -3% -2%)
>OrHighMed  217.55  (6.3%)  216.16  (7.1%)   
> -0.6% ( -13% -   13%)
>   IntNRQ   53.10 (10.9%)   52.90  (8.6%)   
> -0.4% ( -17% -   21%)
>  MedTerm  998.11  (3.8%)  994.82  (5.6%)   
> -0.3% (  -9% -9%)
>  MedSpanNear   60.50  (3.7%)   60.36  (4.8%)   
> -0.2% (  -8% -8%)
> HighSpanNear   19.74  (4.5%)   19.72  (5.1%)   
> -0.1% (  -9% -9%)
>  LowSpanNear  101.93  (3.2%)  101.82  (4.4%)   
> -0.1% (  -7% -7%)
>   AndHighMed  366.18  (1.7%)  366.93  (1.7%)
> 0.2% (  -3% -3%)
> PKLookup  237.28  (4.0%)  237.96  (4.2%)
> 0.3% (  -7% -8%)
>MedPhrase  173.17  (4.7%)  174.69  (4.7%)
> 0.9% (  -8% -   10%)
>  LowSloppyPhrase  180.91  (2.6%)  182.79  (2.7%)
> 1.0% (  -4% -6%)
>LowPhrase  374.64  (5.5%)  379.11  (5.8%)
> 1.2% (  -9% -   13%)
> HighTerm  253.14  (7.9%)  256.97 (11.4%)
> 1.5% ( -16% -   22%)
>   HighPhrase   19.52 (10.6%)   19.83 (11.0%)
> 1.6% ( -18% -   25%)
>  MedSloppyPhrase  141.90  (2.6%)  144.11  (2.5%)
> 1.6% (  -3% -6%)
> HighSloppyPhrase   25.26  (4.8%)   25.97  (5.0%)
> 2.8% (  -6% -   13%)
> {noformat}
> Only queries which are very terms-dictionary-intensive got a performance hit 
> (Fuzzy, Fuzzy2, Respell, Wildcard), other queries including Prefix3 behaved 
> (surprisingly) well.
> Do you think of it as something worth exploring?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
mikemccand commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362542300
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/CompressionAlgorithm.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.blocktree;
+
+import java.io.IOException;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.compress.LowercaseAsciiCompression;
+
+/**
+ * Compression algorithm used for suffixes of a block of terms.
+ */
+enum CompressionAlgorithm {
+
+  NO_COMPRESSION(0x00) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  in.readBytes(out, 0, len);
+}
+
+@Override
+public String toString() {
+  return "no_compression";
+}
+  },
+
+  LOWERCASE_ASCII(0x01) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  LowercaseAsciiCompression.decompress(in, out, len);
+}
+
+@Override
+public String toString() {
+  return "lowercase_ascii";
+}
+  },
+
+  LZ4(0x02) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  org.apache.lucene.util.compress.LZ4.decompress(in, len, out, 0);
+}
+
+@Override
+public String toString() {
+  return "lz4";
+}
+  };
+
+  private static final CompressionAlgorithm[] BY_CODE = new 
CompressionAlgorithm[3];
 
 Review comment:
   +1 for the explicit codes too.  Relying on enum ordinals is dangerously 
fragile ...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
jpountz commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362534343
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/CompressionAlgorithm.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.blocktree;
+
+import java.io.IOException;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.compress.LowercaseAsciiCompression;
+
+/**
+ * Compression algorithm used for suffixes of a block of terms.
+ */
+enum CompressionAlgorithm {
+
+  NO_COMPRESSION(0x00) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  in.readBytes(out, 0, len);
+}
+
+@Override
+public String toString() {
+  return "no_compression";
+}
+  },
+
+  LOWERCASE_ASCII(0x01) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  LowercaseAsciiCompression.decompress(in, out, len);
+}
+
+@Override
+public String toString() {
+  return "lowercase_ascii";
+}
+  },
+
+  LZ4(0x02) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  org.apache.lucene.util.compress.LZ4.decompress(in, len, out, 0);
+}
+
+@Override
+public String toString() {
+  return "lz4";
+}
+  };
+
+  private static final CompressionAlgorithm[] BY_CODE = new 
CompressionAlgorithm[3];
 
 Review comment:
   Oops I had not seen @dweiss 's comment when writing mine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9053) java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 8b] vs input=[ef ac 81 67 75 72 65]

2020-01-02 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006894#comment-17006894
 ] 

Michael McCandless commented on LUCENE-9053:


+1 to improve {{package-info.java}} in 
{{lucene/core/src/java/org/apache/lucene/util}}!

Maybe we should just say Unicode code point order, not UTF16 as Java's 
{{String.compareTo}} sorts?

> java.lang.AssertionError: inputs are added out of order lastInput=[f0 9d 9c 
> 8b] vs input=[ef ac 81 67 75 72 65]
> ---
>
> Key: LUCENE-9053
> URL: https://issues.apache.org/jira/browse/LUCENE-9053
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: gitesh
>Priority: Minor
>
> Even if the inputs are sorted in unicode order, I get following exception 
> while creating FST:
>  
> {code:java}
> // Input values (keys). These must be provided to Builder in Unicode sorted 
> order!
> String inputValues[] = {"퐴", "figure", "flagship"};
> long outputValues[] = {5, 7, 12};
> PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
> Builder builder = new Builder(FST.INPUT_TYPE.BYTE1, outputs);
> BytesRefBuilder scratchBytes = new BytesRefBuilder();
> IntsRefBuilder scratchInts = new IntsRefBuilder();
> for (int i = 0; i < inputValues.length; i++) {
>  scratchBytes.copyChars(inputValues[i]);
>  builder.add(Util.toIntsRef(scratchBytes.get(), scratchInts), 
> outputValues[i]);
> }
> FST fst = builder.finish();
> Long value = Util.get(fst, new BytesRef("figure"));
> System.out.println(value);
> {code}
>  Please note that figure {color:#172b4d}and{color} flagship {color:#172b4d}are 
> using the ligature character{color} fl {color:#172b4d}above. {color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1055: SOLR-13932 Review directory locking and Blob interactions

2020-01-02 Thread GitBox
murblanc commented on a change in pull request #1055: SOLR-13932 Review 
directory locking and Blob interactions
URL: https://github.com/apache/lucene-solr/pull/1055#discussion_r362520142
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/store/blob/metadata/SharedStoreResolutionUtil.java
 ##
 @@ -147,42 +139,53 @@ public static SharedMetadataResolutionResult 
resolveMetadata(ServerSideMetadata
 
 if (local == null) {
   // The shard index data does not exist locally. All we can do is pull.  
-  // We've computed blobFilesMissingLocally and localFilesMissingOnBlob is 
empty as it should be.
+  // We've computed blobFilesMissingLocally. localFilesMissingOnBlob is 
empty as it should be.
   return new 
SharedMetadataResolutionResult(localFilesMissingOnBlob.values(), 
blobFilesMissingLocally.values(), blobFilesMissingLocally.values(), false);
 }
 
-boolean localConflictingWithBlob = false;
+// If trying to pull files from Blob, make sure similarly named files do 
not already exist outside the current commit point
+ImmutableSet allLocalFiles = local.getAllFiles();
+
+boolean downloadToNewDir = false;
 
 Review comment:
   pulling from Blob downloads files :)
   Not sure I understand what the link pasted above points to.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
jpountz commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362519930
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/CompressionAlgorithm.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.blocktree;
+
+import java.io.IOException;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.compress.LowercaseAsciiCompression;
+
+/**
+ * Compression algorithm used for suffixes of a block of terms.
+ */
+enum CompressionAlgorithm {
+
+  NO_COMPRESSION(0x00) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  in.readBytes(out, 0, len);
+}
+
+@Override
+public String toString() {
+  return "no_compression";
+}
+  },
+
+  LOWERCASE_ASCII(0x01) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  LowercaseAsciiCompression.decompress(in, out, len);
+}
+
+@Override
+public String toString() {
+  return "lowercase_ascii";
+}
+  },
+
+  LZ4(0x02) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  org.apache.lucene.util.compress.LZ4.decompress(in, len, out, 0);
+}
+
+@Override
+public String toString() {
+  return "lz4";
+}
+  };
+
+  private static final CompressionAlgorithm[] BY_CODE = new 
CompressionAlgorithm[3];
 
 Review comment:
   It doesn't address the concern that something that looks as innocuous as 
reordering constants or adding one at any position but in the end breaks 
serialization. I have a slight preference for being explicit like Dawid, but 
I've been in the middle of this controversy many times already and it's a 
matter of robustness vs. conciseness in the end. If you feel strongly about it 
@dsmiley I'll switch back to using `#ordinal()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1055: SOLR-13932 Review directory locking and Blob interactions

2020-01-02 Thread GitBox
murblanc commented on a change in pull request #1055: SOLR-13932 Review 
directory locking and Blob interactions
URL: https://github.com/apache/lucene-solr/pull/1055#discussion_r362517551
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/store/blob/metadata/ServerSideMetadata.java
 ##
 @@ -132,35 +127,15 @@ public ServerSideMetadata(String coreName, CoreContainer 
container, boolean take
 generation = latestCommit.getGeneration();
 latestCommitFiles = latestCommitBuilder.build();
 
-// Capture now the hash and verify again if we need to pull content 
from the Blob store into this directory,
-// to make sure there are no local changes at the same time that might 
lead to a corruption in case of interaction
-// with the download.
-// TODO: revise with "design assumptions around pull pipeline" 
mentioned in allCommits TODO below
+// Capture now the hash and verify again after files have been pulled 
and before the directory is updated (or before
+// the index is switched to use a new directory) to make sure there 
are no local changes at the same time that might
+// lead to a corruption in case of interaction with the download or 
might be a sign of other problems (it is not
+// expected that indexing can happen on a local directory of a SHARED 
replica if that replica is not up to date with
+// the Blob store version).
 directoryHash = getSolrDirectoryHash(coreDir);
 
-allCommitsFiles = latestCommitFiles;
-// TODO: allCommits was added to detect special cases where inactive 
file segments can potentially conflict
-//   with whats in shared store. But given the recent 
understanding of semantics around index directory locks
-//   we need to revise our design assumptions around pull 
pipeline, including this one.
-//   Disabling this for now so that unreliability around 
introspection of older commits 
-//   might not get in the way of steady state indexing.
-//// A note on listCommits says that it does not guarantee consistent 
results if a commit is in progress.
-//// But in blob context we serialize commits and pulls by proper 
locking therefore we should be good here.
-//List allCommits = DirectoryReader.listCommits(coreDir);
-//
-//// we should always have a commit point as verified in the beginning 
of this method.
-//assert (allCommits.size() > 1) || (allCommits.size() == 1 && 
allCommits.get(0).equals(latestCommit));
-//
-//// optimization:  normally we would only be dealing with one commit 
point. In that case just reuse latest commit files builder.
-//ImmutableCollection.Builder allCommitsBuilder = 
latestCommitBuilder;
-//if (allCommits.size() > 1) {
-//  allCommitsBuilder = new ImmutableSet.Builder<>();
-//  for (IndexCommit commit : allCommits) {
-//// no snapshot for inactive segments files
-//buildCommitFiles(coreDir, commit, allCommitsBuilder, /* 
snapshotDir */ null);
-//  }
-//}
-//allCommitsFiles = allCommitsBuilder.build();
+// Need to inventory all local files in case files that need to be 
pulled from Blob conflict with them.
+allFiles = ImmutableSet.copyOf(coreDir.listAll());
 
 Review comment:
   You're right on both counts. In order to limit complexity at this stage, I'm 
tempted to leave these optimizations for later (likely very minor improvement 
for indexing given the higher cost of pushing to Blob store).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
dweiss commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362516139
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/CompressionAlgorithm.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.blocktree;
+
+import java.io.IOException;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.compress.LowercaseAsciiCompression;
+
+/**
+ * Compression algorithm used for suffixes of a block of terms.
+ */
+enum CompressionAlgorithm {
+
+  NO_COMPRESSION(0x00) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  in.readBytes(out, 0, len);
+}
+
+@Override
+public String toString() {
+  return "no_compression";
+}
+  },
+
+  LOWERCASE_ASCII(0x01) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  LowercaseAsciiCompression.decompress(in, out, len);
+}
+
+@Override
+public String toString() {
+  return "lowercase_ascii";
+}
+  },
+
+  LZ4(0x02) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  org.apache.lucene.util.compress.LZ4.decompress(in, len, out, 0);
+}
+
+@Override
+public String toString() {
+  return "lz4";
+}
+  };
+
+  private static final CompressionAlgorithm[] BY_CODE = new 
CompressionAlgorithm[3];
 
 Review comment:
   You are entitled to your opinion, David. But the comment Adrien made still 
holds: when you reorder or add/ remove enum constants, their ordinal will 
change and this side effect is worth guarding against. The "code" may be 
verbose but it is also explicit and makes making a mistake more difficult. I 
think it's valuable here.
   
   Anyway. My original comment about hardcoded values was only to point out 
that code used the same constants in different places. I suggested an enum 
(like Adrien implemented) but it could as well be a set of constant integers 
declared in one place. I don't care about this (but I do care about using or 
overriding ordinal()...).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1055: SOLR-13932 Review directory locking and Blob interactions

2020-01-02 Thread GitBox
murblanc commented on a change in pull request #1055: SOLR-13932 Review 
directory locking and Blob interactions
URL: https://github.com/apache/lucene-solr/pull/1055#discussion_r362514824
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/store/blob/metadata/CorePushPull.java
 ##
 @@ -268,7 +264,7 @@ public void pullUpdateFromBlob(long requestQueuedTimeMs, 
boolean waitForSearcher
   }
 
 
 Review comment:
   Pull is indeed exclusive, but let's not rely on this (i.e. be defensive). We 
do check the directory hasn't changed during the pull before adding back the 
files and reopening the IW, so I think we're ok.
   
   Not sure about your reference @mbwaheed to SolrIndexSplitter. The lock is 
acquired there on a directory that I'm not sure is the index directory (it is 
searcher.getRawReader().directory() on the passed instance of 
SolrIndexSearcher). We manipulate solrCore.getIndexDir().
   
   The directory and IndexWriter manipulation in 
IndexFetcher.fetchLatestIndex() is similar to the one we do here (no surprise, 
it was used as "inspiration").


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
jpountz commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362514660
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/CompressionAlgorithm.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.blocktree;
+
+import java.io.IOException;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.compress.LowercaseAsciiCompression;
+
+/**
+ * Compression algorithm used for suffixes of a block of terms.
+ */
+enum CompressionAlgorithm {
+
+  NO_COMPRESSION(0x00) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  in.readBytes(out, 0, len);
+}
+
+@Override
+public String toString() {
+  return "no_compression";
+}
+  },
+
+  LOWERCASE_ASCII(0x01) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  LowercaseAsciiCompression.decompress(in, out, len);
+}
+
+@Override
+public String toString() {
+  return "lowercase_ascii";
+}
+  },
+
+  LZ4(0x02) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  org.apache.lucene.util.compress.LZ4.decompress(in, len, out, 0);
+}
+
+@Override
+public String toString() {
 
 Review comment:
   I switched back to the default implementation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8673) Use radix partitioning when merging dimensional points

2020-01-02 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006851#comment-17006851
 ] 

Ignacio Vera commented on LUCENE-8673:
--

I don't think it is an issue with the code, the test is running with RAM 
directory and therefore OOM is possible when. using the offline writer:

{code:java}
java.lang.OutOfMemoryError: Java heap space
at 
__randomizedtesting.SeedInfo.seed([34794AE69CB7F902:B32E37690DEE8582]:0)
at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:84)
at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:57)
at 
org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:168)
at 
org.apache.lucene.store.RAMOutputStream.writeBytes(RAMOutputStream.java:154)
at 
org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:141)
at 
org.apache.lucene.util.bkd.OfflinePointWriter.append(OfflinePointWriter.java:67)
at 
org.apache.lucene.util.bkd.BKDRadixSelector.offlinePartition(BKDRadixSelector.java:282)
 
{code}

I do not remember how to disable that type of directory in tyre test but we 
should prevent it.

> Use radix partitioning when merging dimensional points
> --
>
> Key: LUCENE-8673
> URL: https://issues.apache.org/jira/browse/LUCENE-8673
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.x, master (9.0)
>
> Attachments: Geo3D.png, Geo3D.png, Geo3D.png, LatLonPoint.png, 
> LatLonPoint.png, LatLonPoint.png, LatLonShape.png, LatLonShape.png, 
> LatLonShape.png
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Following the advise of [~jpountz] in LUCENE-8623I have investigated using 
> radix selection when merging segments instead of sorting the data at the 
> beginning. The results are pretty promising when running Lucene geo 
> benchmarks:
>  
> ||Approach||Index time (sec): Dev||Index Time (sec): Base||Index Time: 
> Diff||Force merge time (sec): Dev||Force Merge time (sec): Base||Force Merge 
> Time: Diff||Index size (GB): Dev||Index size (GB): Base||Index Size: 
> Diff||Reader heap (MB): Dev||Reader heap (MB): Base||Reader heap: Diff
> |points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
> |shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
> |geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|
>  
> edited: table formatting to be a jira table
>  
> In 2D the index throughput is more or less equal but for higher dimensions 
> the impact is quite big. In all cases the merging process requires much less 
> disk space, I am attaching plots showing the different behaviour and I am 
> opening a pull request.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
dsmiley commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362489542
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/CompressionAlgorithm.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.blocktree;
+
+import java.io.IOException;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.compress.LowercaseAsciiCompression;
+
+/**
+ * Compression algorithm used for suffixes of a block of terms.
+ */
+enum CompressionAlgorithm {
+
+  NO_COMPRESSION(0x00) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  in.readBytes(out, 0, len);
+}
+
+@Override
+public String toString() {
+  return "no_compression";
+}
+  },
+
+  LOWERCASE_ASCII(0x01) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  LowercaseAsciiCompression.decompress(in, out, len);
+}
+
+@Override
+public String toString() {
+  return "lowercase_ascii";
+}
+  },
+
+  LZ4(0x02) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  org.apache.lucene.util.compress.LZ4.decompress(in, len, out, 0);
+}
+
+@Override
+public String toString() {
+  return "lz4";
+}
+  };
+
+  private static final CompressionAlgorithm[] BY_CODE = new 
CompressionAlgorithm[3];
 
 Review comment:
   BTW I can understand not wanting clients of an enum to call ordinal() on it; 
they should call a `getCode()` method, and the implementation of that could be 
ordinal().


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
dsmiley commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362484551
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/CompressionAlgorithm.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.blocktree;
+
+import java.io.IOException;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.compress.LowercaseAsciiCompression;
+
+/**
+ * Compression algorithm used for suffixes of a block of terms.
+ */
+enum CompressionAlgorithm {
+
+  NO_COMPRESSION(0x00) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  in.readBytes(out, 0, len);
+}
+
+@Override
+public String toString() {
+  return "no_compression";
+}
+  },
+
+  LOWERCASE_ASCII(0x01) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  LowercaseAsciiCompression.decompress(in, out, len);
+}
+
+@Override
+public String toString() {
+  return "lowercase_ascii";
+}
+  },
+
+  LZ4(0x02) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  org.apache.lucene.util.compress.LZ4.decompress(in, len, out, 0);
+}
+
+@Override
+public String toString() {
+  return "lz4";
+}
+  };
+
+  private static final CompressionAlgorithm[] BY_CODE = new 
CompressionAlgorithm[3];
 
 Review comment:
   I don't view the public methods of an Enum as "internals".


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
dweiss commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362483152
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/CompressionAlgorithm.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.blocktree;
+
+import java.io.IOException;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.compress.LowercaseAsciiCompression;
+
+/**
+ * Compression algorithm used for suffixes of a block of terms.
+ */
+enum CompressionAlgorithm {
+
+  NO_COMPRESSION(0x00) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  in.readBytes(out, 0, len);
+}
+
+@Override
+public String toString() {
+  return "no_compression";
+}
+  },
+
+  LOWERCASE_ASCII(0x01) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  LowercaseAsciiCompression.decompress(in, out, len);
+}
+
+@Override
+public String toString() {
+  return "lowercase_ascii";
+}
+  },
+
+  LZ4(0x02) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  org.apache.lucene.util.compress.LZ4.decompress(in, len, out, 0);
+}
+
+@Override
+public String toString() {
+  return "lz4";
+}
+  };
+
+  private static final CompressionAlgorithm[] BY_CODE = new 
CompressionAlgorithm[3];
 
 Review comment:
   Exactly. I prefer the explicit code. Gives you control and context. Ordinals 
are really internal details of a particular implementation of enums.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
dsmiley commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362481693
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/CompressionAlgorithm.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.blocktree;
+
+import java.io.IOException;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.compress.LowercaseAsciiCompression;
+
+/**
+ * Compression algorithm used for suffixes of a block of terms.
+ */
+enum CompressionAlgorithm {
+
+  NO_COMPRESSION(0x00) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  in.readBytes(out, 0, len);
+}
+
+@Override
+public String toString() {
+  return "no_compression";
+}
+  },
+
+  LOWERCASE_ASCII(0x01) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  LowercaseAsciiCompression.decompress(in, out, len);
+}
+
+@Override
+public String toString() {
+  return "lowercase_ascii";
+}
+  },
+
+  LZ4(0x02) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  org.apache.lucene.util.compress.LZ4.decompress(in, len, out, 0);
+}
+
+@Override
+public String toString() {
 
 Review comment:
   I suggest then adding a default implementation that toLowerCase's name().  
It would also be helpful to add a comment explaining where this is used; I 
didn't know.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
dsmiley commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362481104
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/CompressionAlgorithm.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.blocktree;
+
+import java.io.IOException;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.compress.LowercaseAsciiCompression;
+
+/**
+ * Compression algorithm used for suffixes of a block of terms.
+ */
+enum CompressionAlgorithm {
+
+  NO_COMPRESSION(0x00) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  in.readBytes(out, 0, len);
+}
+
+@Override
+public String toString() {
+  return "no_compression";
+}
+  },
+
+  LOWERCASE_ASCII(0x01) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  LowercaseAsciiCompression.decompress(in, out, len);
+}
+
+@Override
+public String toString() {
+  return "lowercase_ascii";
+}
+  },
+
+  LZ4(0x02) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  org.apache.lucene.util.compress.LZ4.decompress(in, len, out, 0);
+}
+
+@Override
+public String toString() {
+  return "lz4";
+}
+  };
+
+  private static final CompressionAlgorithm[] BY_CODE = new 
CompressionAlgorithm[3];
 
 Review comment:
   I understand that but it sandbags the current implementation with verbosity 
it doesn't need.  We can adjust in the future if we actually need to.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
jpountz commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362480952
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/CompressionAlgorithm.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.blocktree;
+
+import java.io.IOException;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.compress.LowercaseAsciiCompression;
+
+/**
+ * Compression algorithm used for suffixes of a block of terms.
+ */
+enum CompressionAlgorithm {
+
+  NO_COMPRESSION(0x00) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  in.readBytes(out, 0, len);
+}
+
+@Override
+public String toString() {
+  return "no_compression";
+}
+  },
+
+  LOWERCASE_ASCII(0x01) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  LowercaseAsciiCompression.decompress(in, out, len);
+}
+
+@Override
+public String toString() {
+  return "lowercase_ascii";
+}
+  },
+
+  LZ4(0x02) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  org.apache.lucene.util.compress.LZ4.decompress(in, len, out, 0);
+}
+
+@Override
+public String toString() {
 
 Review comment:
   I wanted to keep the string representations lowercased in Stats#toString 
like in the previous iteration of this pull request, but no strong feeling, I 
don't mind removing these `toString` implementations.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
jpountz commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362480549
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/CompressionAlgorithm.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.blocktree;
+
+import java.io.IOException;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.compress.LowercaseAsciiCompression;
+
+/**
+ * Compression algorithm used for suffixes of a block of terms.
+ */
+enum CompressionAlgorithm {
+
+  NO_COMPRESSION(0x00) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  in.readBytes(out, 0, len);
+}
+
+@Override
+public String toString() {
+  return "no_compression";
+}
+  },
+
+  LOWERCASE_ASCII(0x01) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  LowercaseAsciiCompression.decompress(in, out, len);
+}
+
+@Override
+public String toString() {
+  return "lowercase_ascii";
+}
+  },
+
+  LZ4(0x02) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  org.apache.lucene.util.compress.LZ4.decompress(in, len, out, 0);
+}
+
+@Override
+public String toString() {
+  return "lz4";
+}
+  };
+
+  private static final CompressionAlgorithm[] BY_CODE = new 
CompressionAlgorithm[3];
 
 Review comment:
   I'm fine either way but I know some people strongly prefer decoupling ids - 
which are used for serialization and shouldn't change - from enum ordinals that 
might change if new constants are inserted or if values are reordered.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8673) Use radix partitioning when merging dimensional points

2020-01-02 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006821#comment-17006821
 ] 

David Smiley commented on LUCENE-8673:
--


Out of Memory error in Radix sort code

FAILED:  org.apache.lucene.document.TestXYMultiPolygonShapeQueries.testRandomBig

https://builds.apache.org/job/Lucene-Solr-NightlyTests-8.4/17/

> Use radix partitioning when merging dimensional points
> --
>
> Key: LUCENE-8673
> URL: https://issues.apache.org/jira/browse/LUCENE-8673
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.x, master (9.0)
>
> Attachments: Geo3D.png, Geo3D.png, Geo3D.png, LatLonPoint.png, 
> LatLonPoint.png, LatLonPoint.png, LatLonShape.png, LatLonShape.png, 
> LatLonShape.png
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Following the advise of [~jpountz] in LUCENE-8623I have investigated using 
> radix selection when merging segments instead of sorting the data at the 
> beginning. The results are pretty promising when running Lucene geo 
> benchmarks:
>  
> ||Approach||Index time (sec): Dev||Index Time (sec): Base||Index Time: 
> Diff||Force merge time (sec): Dev||Force Merge time (sec): Base||Force Merge 
> Time: Diff||Index size (GB): Dev||Index size (GB): Base||Index Size: 
> Diff||Reader heap (MB): Dev||Reader heap (MB): Base||Reader heap: Diff
> |points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
> |shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
> |geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|
>  
> edited: table formatting to be a jira table
>  
> In 2D the index throughput is more or less equal but for higher dimensions 
> the impact is quite big. In all cases the merging process requires much less 
> disk space, I am attaching plots showing the different behaviour and I am 
> opening a pull request.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
dsmiley commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362476910
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/CompressionAlgorithm.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.blocktree;
+
+import java.io.IOException;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.compress.LowercaseAsciiCompression;
+
+/**
+ * Compression algorithm used for suffixes of a block of terms.
+ */
+enum CompressionAlgorithm {
+
+  NO_COMPRESSION(0x00) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  in.readBytes(out, 0, len);
+}
+
+@Override
+public String toString() {
+  return "no_compression";
+}
+  },
+
+  LOWERCASE_ASCII(0x01) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  LowercaseAsciiCompression.decompress(in, out, len);
+}
+
+@Override
+public String toString() {
+  return "lowercase_ascii";
+}
+  },
+
+  LZ4(0x02) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  org.apache.lucene.util.compress.LZ4.decompress(in, len, out, 0);
+}
+
+@Override
+public String toString() {
+  return "lz4";
+}
+  };
+
+  private static final CompressionAlgorithm[] BY_CODE = new 
CompressionAlgorithm[3];
 
 Review comment:
   Why bother with this -- explicit "code" in constructors (when the enum's 
intrinsic ordinal will do) and explicit BY_CODE construction when you could do 
`private static final CompressionAlgorithm[] BY_CODE = values();`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
dsmiley commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362477434
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/CompressionAlgorithm.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.blocktree;
+
+import java.io.IOException;
+
+import org.apache.lucene.store.DataInput;
+import org.apache.lucene.util.compress.LowercaseAsciiCompression;
+
+/**
+ * Compression algorithm used for suffixes of a block of terms.
+ */
+enum CompressionAlgorithm {
+
+  NO_COMPRESSION(0x00) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  in.readBytes(out, 0, len);
+}
+
+@Override
+public String toString() {
+  return "no_compression";
+}
+  },
+
+  LOWERCASE_ASCII(0x01) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  LowercaseAsciiCompression.decompress(in, out, len);
+}
+
+@Override
+public String toString() {
+  return "lowercase_ascii";
+}
+  },
+
+  LZ4(0x02) {
+
+@Override
+void read(DataInput in, byte[] out, int len) throws IOException {
+  org.apache.lucene.util.compress.LZ4.decompress(in, len, out, 0);
+}
+
+@Override
+public String toString() {
 
 Review comment:
   What's wrong with the default enum toString?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14130) Add postlogs command line tool for indexing Solr logs

2020-01-02 Thread Joel Bernstein (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-14130:
--
Attachment: SOLR-14130.patch

> Add postlogs command line tool for indexing Solr logs
> -
>
> Key: SOLR-14130
> URL: https://issues.apache.org/jira/browse/SOLR-14130
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, Screen Shot 2019-12-19 
> at 2.04.41 PM.png, Screen Shot 2019-12-19 at 2.16.01 PM.png, Screen Shot 
> 2019-12-19 at 2.35.41 PM.png, Screen Shot 2019-12-21 at 8.46.51 AM.png
>
>
> This ticket adds a simple command line tool for posting Solr logs to a solr 
> index. The tool works with the out of the box Solr log format. Still a work 
> in progress but currently indexes:
>  * queries
>  * updates
>  * commits
>  * new searchers
>  * errors - including stack traces
> Attached are some sample visualizations using Solr Streaming Expressions and 
> Math Expressions after the data has been loaded. The visualizations show: 
> time series, scatter plots, histograms and quantile plots, but really this is 
> just scratching the surface of the visualizations that can be done with the 
> Solr logs.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9107) CommonsTermsQuery with huge no. of terms slower with top-k scoring

2020-01-02 Thread Tommaso Teofili (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006815#comment-17006815
 ] 

Tommaso Teofili commented on LUCENE-9107:
-

thanks Adrien for looking into this, I've tried with a pure disjunction 
(BooleanQuery) and the numbers are about the same as with {{CommonTermsQuery}}. 
{{ClassicSimilarity}} slowness contribution is non trivial: top-k scoring with 
{{ClassicSimilarity}} ranges 2 to 2.5 seconds, whereas it ranges 1.5 to 2 
seconds with {{BM25Similarity}}.

> CommonsTermsQuery with huge no. of terms slower with top-k scoring
> --
>
> Key: LUCENE-9107
> URL: https://issues.apache.org/jira/browse/LUCENE-9107
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.3
>Reporter: Tommaso Teofili
>Priority: Major
>
> In [1] a {{CommonTermsQuery}} is used in order to perform a query with lots 
> of (duplicate) terms. Using a max term frequency cutoff of 0.999 for low 
> frequency terms, the query, although big, finishes in around 2-300ms with 
> Lucene 7.6.0. 
> However, when upgrading the code to Lucene 8.x, the query runs in 2-3s 
> instead [2].
> After digging a bit into it it seems that the regression in speed comes from 
> the fact that top-k scoring introduced by default in version 8 is causing 
> that, not sure "where" exactly in the code though.
> When switching back to complete hit scoring [3], the speed goes back to the 
> initial 2-300ms also in Lucene 8.3.x.
> It'd be nice to understand the reason why this is happening and if it is only 
> concerning {{CommonTermsQuery}} or affecting {{BooleanQuery}} as well.
> If this is a case that depends on the data and application involved (Anserini 
> in this case), the application should handle it, otherwise if it is a 
> regression/bug in Lucene it'd be nice to fix it.
> [1] : 
> https://github.com/tteofili/Anserini-embeddings/blob/nnsearch/src/main/java/io/anserini/embeddings/nn/fw/FakeWordsRunner.java
> [2] : 
> https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java
> [3] : 
> https://github.com/tteofili/anserini/blob/ann-paper-reproduce/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java#L174



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14122) SimUtils incorrectly converts v2 to v1 request params

2020-01-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14122.
-
Resolution: Fixed

> SimUtils incorrectly converts v2 to v1 request params
> -
>
> Key: SOLR-14122
> URL: https://issues.apache.org/jira/browse/SOLR-14122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.5
>
>
> As reported by Li Cao on the mailing list:
> {quote}I am using solr 8.3.0 in cloud mode. I have collection level 
> autoscaling policy and the collection name is “entity”. But when I run 
> autoscaling simulation all the steps failed with this message:
>   "error":{
> "exception":"java.io.IOException: 
> java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: org.apache.solr.common.SolrException: 
> Could not find collection : entity/shards",
> "suggestion":{
>   "type":"repair",
>   "operation":{
> "method":"POST",
> "path":"/c/entity/shards",
> "command":{"add-replica":{
> "shard":"shard2",
> "node":"my_node:8983_solr",
> "type":"TLOG",
> "replicaInfo":null}}},{quote}
> The simulation package internally uses v1 APIs but the requests created by 
> the autoscaling framework may use v2 APIs. The utility class {{SimUtils}} 
> converts v2 request parameters to v1 parameters, without actually using the 
> apispec or v2 Api handlers (as that would mean adding more complexity to the 
> simulator, and only tangentially related to the autoscaling).
> There's a bug in this utility when converting the path of the request - V2 
> apispec uses {{/c/\{collection}/shards}} when manipulating shards and 
> replicas - unlike V1 which uniformly uses {{/collections}}. The utility class 
> doesn't account for this path difference and creates invalid collection names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14130) Add postlogs command line tool for indexing Solr logs

2020-01-02 Thread Joel Bernstein (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-14130:
--
Attachment: SOLR-14130.patch

> Add postlogs command line tool for indexing Solr logs
> -
>
> Key: SOLR-14130
> URL: https://issues.apache.org/jira/browse/SOLR-14130
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14130.patch, SOLR-14130.patch, SOLR-14130.patch, 
> SOLR-14130.patch, SOLR-14130.patch, Screen Shot 2019-12-19 at 2.04.41 PM.png, 
> Screen Shot 2019-12-19 at 2.16.01 PM.png, Screen Shot 2019-12-19 at 2.35.41 
> PM.png, Screen Shot 2019-12-21 at 8.46.51 AM.png
>
>
> This ticket adds a simple command line tool for posting Solr logs to a solr 
> index. The tool works with the out of the box Solr log format. Still a work 
> in progress but currently indexes:
>  * queries
>  * updates
>  * commits
>  * new searchers
>  * errors - including stack traces
> Attached are some sample visualizations using Solr Streaming Expressions and 
> Math Expressions after the data has been loaded. The visualizations show: 
> time series, scatter plots, histograms and quantile plots, but really this is 
> just scratching the surface of the visualizations that can be done with the 
> Solr logs.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14122) SimUtils incorrectly converts v2 to v1 request params

2020-01-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006805#comment-17006805
 ] 

ASF subversion and git services commented on SOLR-14122:


Commit 22386a1f127c03cff4506571b9a77062ea5dec08 in lucene-solr's branch 
refs/heads/branch_8x from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=22386a1 ]

SOLR-14122: SimUtils converts v2 to v1 request params incorrectly.


> SimUtils incorrectly converts v2 to v1 request params
> -
>
> Key: SOLR-14122
> URL: https://issues.apache.org/jira/browse/SOLR-14122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.5
>
>
> As reported by Li Cao on the mailing list:
> {quote}I am using solr 8.3.0 in cloud mode. I have collection level 
> autoscaling policy and the collection name is “entity”. But when I run 
> autoscaling simulation all the steps failed with this message:
>   "error":{
> "exception":"java.io.IOException: 
> java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: org.apache.solr.common.SolrException: 
> Could not find collection : entity/shards",
> "suggestion":{
>   "type":"repair",
>   "operation":{
> "method":"POST",
> "path":"/c/entity/shards",
> "command":{"add-replica":{
> "shard":"shard2",
> "node":"my_node:8983_solr",
> "type":"TLOG",
> "replicaInfo":null}}},{quote}
> The simulation package internally uses v1 APIs but the requests created by 
> the autoscaling framework may use v2 APIs. The utility class {{SimUtils}} 
> converts v2 request parameters to v1 parameters, without actually using the 
> apispec or v2 Api handlers (as that would mean adding more complexity to the 
> simulator, and only tangentially related to the autoscaling).
> There's a bug in this utility when converting the path of the request - V2 
> apispec uses {{/c/\{collection}/shards}} when manipulating shards and 
> replicas - unlike V1 which uniformly uses {{/collections}}. The utility class 
> doesn't account for this path difference and creates invalid collection names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14122) SimUtils incorrectly converts v2 to v1 request params

2020-01-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006800#comment-17006800
 ] 

ASF subversion and git services commented on SOLR-14122:


Commit 15d5e6662c3fcee6d19979b7c2ff49a28610aca3 in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=15d5e66 ]

SOLR-14122: add unit test.


> SimUtils incorrectly converts v2 to v1 request params
> -
>
> Key: SOLR-14122
> URL: https://issues.apache.org/jira/browse/SOLR-14122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.5
>
>
> As reported by Li Cao on the mailing list:
> {quote}I am using solr 8.3.0 in cloud mode. I have collection level 
> autoscaling policy and the collection name is “entity”. But when I run 
> autoscaling simulation all the steps failed with this message:
>   "error":{
> "exception":"java.io.IOException: 
> java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: org.apache.solr.common.SolrException: 
> Could not find collection : entity/shards",
> "suggestion":{
>   "type":"repair",
>   "operation":{
> "method":"POST",
> "path":"/c/entity/shards",
> "command":{"add-replica":{
> "shard":"shard2",
> "node":"my_node:8983_solr",
> "type":"TLOG",
> "replicaInfo":null}}},{quote}
> The simulation package internally uses v1 APIs but the requests created by 
> the autoscaling framework may use v2 APIs. The utility class {{SimUtils}} 
> converts v2 request parameters to v1 parameters, without actually using the 
> apispec or v2 Api handlers (as that would mean adding more complexity to the 
> simulator, and only tangentially related to the autoscaling).
> There's a bug in this utility when converting the path of the request - V2 
> apispec uses {{/c/\{collection}/shards}} when manipulating shards and 
> replicas - unlike V1 which uniformly uses {{/collections}}. The utility class 
> doesn't account for this path difference and creates invalid collection names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14122) SimUtils incorrectly converts v2 to v1 request params

2020-01-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006793#comment-17006793
 ] 

ASF subversion and git services commented on SOLR-14122:


Commit 38b9af21f1d84a0583741d1e023e843acf16c823 in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=38b9af2 ]

SOLR-14122: SimUtils converts v2 to v1 request params incorrectly.


> SimUtils incorrectly converts v2 to v1 request params
> -
>
> Key: SOLR-14122
> URL: https://issues.apache.org/jira/browse/SOLR-14122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.5
>
>
> As reported by Li Cao on the mailing list:
> {quote}I am using solr 8.3.0 in cloud mode. I have collection level 
> autoscaling policy and the collection name is “entity”. But when I run 
> autoscaling simulation all the steps failed with this message:
>   "error":{
> "exception":"java.io.IOException: 
> java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: org.apache.solr.common.SolrException: 
> Could not find collection : entity/shards",
> "suggestion":{
>   "type":"repair",
>   "operation":{
> "method":"POST",
> "path":"/c/entity/shards",
> "command":{"add-replica":{
> "shard":"shard2",
> "node":"my_node:8983_solr",
> "type":"TLOG",
> "replicaInfo":null}}},{quote}
> The simulation package internally uses v1 APIs but the requests created by 
> the autoscaling framework may use v2 APIs. The utility class {{SimUtils}} 
> converts v2 request parameters to v1 parameters, without actually using the 
> apispec or v2 Api handlers (as that would mean adding more complexity to the 
> simulator, and only tangentially related to the autoscaling).
> There's a bug in this utility when converting the path of the request - V2 
> apispec uses {{/c/\{collection}/shards}} when manipulating shards and 
> replicas - unlike V1 which uniformly uses {{/collections}}. The utility class 
> doesn't account for this path difference and creates invalid collection names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
dweiss commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362462638
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/SegmentTermsEnumFrame.java
 ##
 @@ -163,15 +179,48 @@ void loadBlock() throws IOException {
 // instead of linear scan to find target term; eg
 // we could have simple array of offsets
 
+final long startSuffixFP = ste.in.getFilePointer();
 // term suffixes:
-code = ste.in.readVInt();
-isLeafBlock = (code & 1) != 0;
-int numBytes = code >>> 1;
-if (suffixBytes.length < numBytes) {
-  suffixBytes = new byte[ArrayUtil.oversize(numBytes, 1)];
+if (version >= BlockTreeTermsReader.VERSION_COMPRESSED_SUFFIXES) {
+  final long codeL = ste.in.readVLong();
+  isLeafBlock = (codeL & 0x04) != 0;
+  final int numSuffixBytes = (int) (codeL >>> 3);
+  if (suffixBytes.length < numSuffixBytes) {
+suffixBytes = new byte[ArrayUtil.oversize(numSuffixBytes, 1)];
+  }
+  compressionAlg = (int) codeL & 0x03;
 
 Review comment:
   Even fancier than I though it'd have to be! Looks great I think.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1126: LUCENE-4702: Terms dictionary compression.

2020-01-02 Thread GitBox
jpountz commented on a change in pull request #1126: LUCENE-4702: Terms 
dictionary compression.
URL: https://github.com/apache/lucene-solr/pull/1126#discussion_r362461295
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/SegmentTermsEnumFrame.java
 ##
 @@ -163,15 +179,48 @@ void loadBlock() throws IOException {
 // instead of linear scan to find target term; eg
 // we could have simple array of offsets
 
+final long startSuffixFP = ste.in.getFilePointer();
 // term suffixes:
-code = ste.in.readVInt();
-isLeafBlock = (code & 1) != 0;
-int numBytes = code >>> 1;
-if (suffixBytes.length < numBytes) {
-  suffixBytes = new byte[ArrayUtil.oversize(numBytes, 1)];
+if (version >= BlockTreeTermsReader.VERSION_COMPRESSED_SUFFIXES) {
+  final long codeL = ste.in.readVLong();
+  isLeafBlock = (codeL & 0x04) != 0;
+  final int numSuffixBytes = (int) (codeL >>> 3);
+  if (suffixBytes.length < numSuffixBytes) {
+suffixBytes = new byte[ArrayUtil.oversize(numSuffixBytes, 1)];
+  }
+  compressionAlg = (int) codeL & 0x03;
 
 Review comment:
   I pushed a change that introduces an enum, does it look better?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9113) Speed up merging doc values terms dictionaries

2020-01-02 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006730#comment-17006730
 ] 

Adrien Grand commented on LUCENE-9113:
--

I indexed the names of all locations in GeoNames's allCountries.txt files in a 
SORTED field, turned on the info stream and forced merge in the end, repeated 3 
times. Without the patch:
{noformat}
$ grep "to merge doc values" infostream.txt
SM 0 [2020-01-02T10:48:35.245544Z; Lucene Merge Thread #0]: 5271 msec to merge 
doc values [6940171 docs]
SM 0 [2020-01-02T10:48:40.080066Z; Lucene Merge Thread #1]: 4802 msec to merge 
doc values [8537845 docs]
SM 1 [2020-01-02T10:48:58.827231Z; Lucene Merge Thread #0]: 5186 msec to merge 
doc values [6940171 docs]
SM 1 [2020-01-02T10:49:03.463976Z; Lucene Merge Thread #1]: 4614 msec to merge 
doc values [8537845 docs]
SM 2 [2020-01-02T10:49:22.077466Z; Lucene Merge Thread #0]: 5191 msec to merge 
doc values [6940171 docs]
SM 2 [2020-01-02T10:49:26.684538Z; Lucene Merge Thread #1]: 4589 msec to merge 
doc values [8537845 docs]
{noformat}

With the patch:
{noformat}
$ grep "to merge doc values" infostream.txt
SM 0 [2020-01-02T10:46:54.743489Z; Lucene Merge Thread #0]: 4314 msec to merge 
doc values [6940171 docs]
SM 0 [2020-01-02T10:46:56.988413Z; Lucene Merge Thread #1]: 2208 msec to merge 
doc values [8537845 docs]
SM 1 [2020-01-02T10:47:14.433368Z; Lucene Merge Thread #0]: 4206 msec to merge 
doc values [6940171 docs]
SM 1 [2020-01-02T10:47:16.589024Z; Lucene Merge Thread #1]: 2136 msec to merge 
doc values [8537845 docs]
SM 2 [2020-01-02T10:47:33.942020Z; Lucene Merge Thread #0]: 4134 msec to merge 
doc values [6940171 docs]
SM 2 [2020-01-02T10:47:36.134355Z; Lucene Merge Thread #1]: 2174 msec to merge 
doc values [8537845 docs]
{noformat}

> Speed up merging doc values terms dictionaries
> --
>
> Key: LUCENE-9113
> URL: https://issues.apache.org/jira/browse/LUCENE-9113
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The default {{DocValuesConsumer#mergeSortedField}} and 
> {{DocValuesConsumer#mergeSortedSetField}} implementations create a merged 
> view of the doc values producers to merge. Unfortunately, it doesn't override 
> {{termsEnum()}}, whose default implementation of {{next()}} increments the 
> ordinal and calls {{lookupOrd()}} to retrieve the term. Currently, 
> {{lookupOrd()}} doesn't take advantage of its current position, and would 
> seek to the block start and then call {{next()}} up to 16 times to go to the 
> desired term. While there are discussions to optimize lookups to take 
> advantage of the current ord (LUCENE-8836), it shouldn't be required for 
> merging to be efficient and we should instead make {{next()}} call {{next()}} 
> on its sub enums.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz opened a new pull request #1136: LUCENE-9113: Speed up merging doc values' terms dictionaries.

2020-01-02 Thread GitBox
jpountz opened a new pull request #1136: LUCENE-9113: Speed up merging doc 
values' terms dictionaries.
URL: https://github.com/apache/lucene-solr/pull/1136
 
 
   This makes the merged view call `TermsEnum#next` on its subs rather than 
`#lookupOrd`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9113) Speed up merging doc values terms dictionaries

2020-01-02 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-9113:
-
Issue Type: Improvement  (was: Bug)

> Speed up merging doc values terms dictionaries
> --
>
> Key: LUCENE-9113
> URL: https://issues.apache.org/jira/browse/LUCENE-9113
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> The default {{DocValuesConsumer#mergeSortedField}} and 
> {{DocValuesConsumer#mergeSortedSetField}} implementations create a merged 
> view of the doc values producers to merge. Unfortunately, it doesn't override 
> {{termsEnum()}}, whose default implementation of {{next()}} increments the 
> ordinal and calls {{lookupOrd()}} to retrieve the term. Currently, 
> {{lookupOrd()}} doesn't take advantage of its current position, and would 
> seek to the block start and then call {{next()}} up to 16 times to go to the 
> desired term. While there are discussions to optimize lookups to take 
> advantage of the current ord (LUCENE-8836), it shouldn't be required for 
> merging to be efficient and we should instead make {{next()}} call {{next()}} 
> on its sub enums.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9113) Speed up merging doc values terms dictionaries

2020-01-02 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-9113:


 Summary: Speed up merging doc values terms dictionaries
 Key: LUCENE-9113
 URL: https://issues.apache.org/jira/browse/LUCENE-9113
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand


The default {{DocValuesConsumer#mergeSortedField}} and 
{{DocValuesConsumer#mergeSortedSetField}} implementations create a merged view 
of the doc values producers to merge. Unfortunately, it doesn't override 
{{termsEnum()}}, whose default implementation of {{next()}} increments the 
ordinal and calls {{lookupOrd()}} to retrieve the term. Currently, 
{{lookupOrd()}} doesn't take advantage of its current position, and would seek 
to the block start and then call {{next()}} up to 16 times to go to the desired 
term. While there are discussions to optimize lookups to take advantage of the 
current ord (LUCENE-8836), it shouldn't be required for merging to be efficient 
and we should instead make {{next()}} call {{next()}} on its sub enums.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9106) UniformSplit postings format should allow extension of block/line serializers.

2020-01-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006703#comment-17006703
 ] 

ASF subversion and git services commented on LUCENE-9106:
-

Commit 1851779ddbfd8ed3148b5d20114bcf2b3651459d in lucene-solr's branch 
refs/heads/gradle-master from Bruno Roustant
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1851779 ]

LUCENE-9106: UniformSplit postings format allows extension of block/line 
serializers.

Closes #1106


> UniformSplit postings format should allow extension of block/line serializers.
> --
>
> Key: LUCENE-9106
> URL: https://issues.apache.org/jira/browse/LUCENE-9106
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Minor
> Fix For: 8.5
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently UniformSplit postings format has static read methods for block / 
> line / header. So it is not possible to extend them to change slightly the 
> format. By introducing non-static serializers it will become possible to 
> extend easily the format to make changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9093) Unified highlighter with word separator never gives context to the left

2020-01-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006704#comment-17006704
 ] 

ASF subversion and git services commented on LUCENE-9093:
-

Commit 4c9cc2cefd7f3593c4b4e1e5a087e3d206298989 in lucene-solr's branch 
refs/heads/gradle-master from Nándor Mátravölgyi
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4c9cc2c ]

LUCENE-9093: UnifiedHighlighter LengthGoalBreakIterator frag align
 Matches in passages should be centered better on average.
 Closes #1123


> Unified highlighter with word separator never gives context to the left
> ---
>
> Key: LUCENE-9093
> URL: https://issues.apache.org/jira/browse/LUCENE-9093
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Tim Retout
>Assignee: David Smiley
>Priority: Major
> Fix For: 8.5
>
> Attachments: LUCENE-9093.patch
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> When using the unified highlighter with hl.bs.type=WORD, I am not able to get 
> context to the left of the matches returned; only words to the right of each 
> match are shown.  I see this behaviour on both Solr 6.4 and Solr 7.1.
> Without context to the left of a match, the highlighted snippets are much 
> less useful for understanding where the match appears in a document.
> As an example, using the techproducts data with Solr 7.1, given a search for 
> "apple", highlighting the "features" field:
> http://localhost:8983/solr/techproducts/select?hl.fl=features=on=apple=WORD=30=unified
> I see this snippet:
> "Apple Lossless, H.264 video"
> Note that "Apple" is anchored to the left.  Compare with the original 
> highlighter:
> http://localhost:8983/solr/techproducts/select?hl.fl=features=on=apple=30
> And the match has context either side:
> ", Audible, Apple Lossless, H.264 video"
> (To complicate this, in general I am not sure that the unified highlighter is 
> respecting the hl.fragsize parameter, although [SOLR-9935] suggests support 
> was added.  I included the hl.fragsize param in the unified URL too, but it's 
> making no difference unless set to 0.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9105) UniformSplit postings format should detect corrupted index

2020-01-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006699#comment-17006699
 ] 

ASF subversion and git services commented on LUCENE-9105:
-

Commit bbb6e418e42ae518a74fc0f97360cd0666a78e80 in lucene-solr's branch 
refs/heads/gradle-master from Bruno Roustant
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bbb6e41 ]

LUCENE-9105: UniformSplit postings format detects corrupted index and better 
handles IO exceptions.

Closes #1105


> UniformSplit postings format should detect corrupted index
> --
>
> Key: LUCENE-9105
> URL: https://issues.apache.org/jira/browse/LUCENE-9105
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> BlockTree postings format has some checks when reading index metadata to 
> detect index corruption. UniformSplit should have the same. Additionally 
> UniformSplit has assertions in BlockReader that should be runtime checks to 
> also detect index corruption (this case has been encountered in production 
> environment).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14109) zkcli.sh and zkcli.bat barfs when LOG4J_PROPS is set

2020-01-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006700#comment-17006700
 ] 

ASF subversion and git services commented on SOLR-14109:


Commit 33bd811fb8b2a9bee595548e96c2a74721aa11b3 in lucene-solr's branch 
refs/heads/gradle-master from Jan Høydahl
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=33bd811 ]

SOLR-14109: Always log to stdout from 
server/scripts/cloud-scripts/zkcli.{bat|sh} (#1130)



> zkcli.sh and zkcli.bat barfs when LOG4J_PROPS is set
> 
>
> Key: SOLR-14109
> URL: https://issues.apache.org/jira/browse/SOLR-14109
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.5, 8.4.1
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I noticed this when running {{zkcli.sh}} from solr's docker image. [The 
> docker image sets the variable 
> {{LOG4J_PROPS}}|https://github.com/docker-solr/docker-solr/blob/master/8.3/Dockerfile],
>  causing the zkcli script to pick up and use that logger instead of the 
> console logger. Problem with that is that Solr's log4j2 config relies on the 
> {{solr.log.dir}} sysprop being set, which it is not when running this script.
> So either fix the wrapper script to set {{solr.log.dir}} or, better, always 
> log to stdout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14129) Reuse Jackson ObjectMapper in AuditLoggerPlugin

2020-01-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006701#comment-17006701
 ] 

ASF subversion and git services commented on SOLR-14129:


Commit c4993bc99ca4e9b1780c900e8bfa242d540ff8b5 in lucene-solr's branch 
refs/heads/gradle-master from Jan Høydahl
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c4993bc ]

SOLR-14129: Reuse Jackson ObjectMapper in AuditLoggerPlugin (#1104)



> Reuse Jackson ObjectMapper in AuditLoggerPlugin
> ---
>
> Key: SOLR-14129
> URL: https://issues.apache.org/jira/browse/SOLR-14129
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Auditlogging
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Labels: perfomance
> Fix For: 8.5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As reported in 
> [https://lists.apache.org/thread.html/7565410ab2d9429b5cada98c70dfde18d9543b63ef8a5cf8723d99d8%40%3Cdev.lucene.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org