Solr field auditing

2019-09-09 Thread Jay Potharaju
Hi,
I am trying to implement some auditing fields in solr to track when was the
last time a document was updated. Basically when a document is updated, I
would like to store when the last time updated + the current timestamp.
example :
*First time indexing*
Doc1 : {id:1, category:shoes, update_date: NOW(), last_update_date:[NOW()]}
*After Update*
Doc1: {id:1, category:shirt, update_date: NOW(), last_update_date:[NOW(),
'2019-09-01']}

I know this can be done easily by logging something in the DB also during
indexing. Or during indexing I can make a call to solr and get the last
indexed time and update the field during indexing.
But was trying to see if update request processor can be used to do this.
Any suggestions?

Thanks
Jay


Re: Sample JWT Solr configuration

2019-09-09 Thread Tyrone
Jan

Can my jwk object be something like

{alg": "HS256", "typ": "JWT",

"sub": "1234567890", "name": "John Doe", "iat": 1516239022,

“k" : "secret-key"}

Where k is the JWT secret key?


Sent from my iPhone

> On Sep 9, 2019, at 1:48 AM, Jan Høydahl  wrote:
> 
> In your security.json, add a JWK matching your signing algorithm, using the 
> “jwk” JSON key.
> 
> Example:
> “jwk” : { "kty" : "oct", "kid" : "0afee142-a0af-4410-abcc-9f2d44ff45b5", 
> "alg" : "HS256", "k" : "FdFYFzERwC2uCBB46pZQi4GG85LujR8obt-KWRBICVQ" }
> 
> Of course you need to find a way to encode your particular secret in jwk 
> format, there should be plenty of tools available for that. If you intend to 
> use symmetric key in prod you have to configure solr so that security.json is 
> not readable for anyone but the admin!
> 
> Jan Høydahl
> 
>> 9. sep. 2019 kl. 05:46 skrev Tyrone :
>> 
>> HS256


[SECURITY] CVE-2019-12401: XML Bomb in Apache Solr versions prior to 5.0

2019-09-09 Thread Tomas Fernandez Lobbe
Severity: Medium

Vendor: The Apache Software Foundation

Versions Affected:
1.3.0 to 1.4.1
3.1.0 to 3.6.2
4.0.0 to 4.10.4

Description: Solr versions prior to 5.0.0 are vulnerable to an XML resource
consumption attack (a.k.a. Lol Bomb) via it’s update handler. By leveraging
XML DOCTYPE and ENTITY type elements, the attacker can create a pattern
that will expand when the server parses the XML causing OOMs

Mitigation:
* Upgrade to Apache Solr 5.0 or later.
* Ensure your network settings are configured so that only trusted traffic
is allowed to post documents to the running Solr instances.

Credit: Matei "Mal" Badanoiu

References:
[1] https://issues.apache.org/jira/browse/SOLR-13750
[2] https://wiki.apache.org/solr/SolrSecurity


CDCR tlog corruption leads to infinite loop

2019-09-09 Thread Webster Homer
We are running Solr 7.2.0

Our configuration has several collections that are loaded into a solr cloud 
which is set to replicate using CDCR to 3 different solrclouds. All of our 
target collections have 2 shards with two replicas per shard. Our source 
collection has 2 shards, and 1 replica per shard.

Frequently we start to see errors where the target collections are out of date, 
and the cdcr action=errors endpoint shows large numbers of errors
For example:
{"responseHeader": {
"status": 0,
"QTime": 0},
"errors": [
"uc1f-ecom-mzk01:2181,uc1f-ecom-mzk02:2181,uc1f-ecom-mzk03:2181/solr",
["sial-catalog-product-20190824",
[
"consecutiveErrors",
700357,
"bad_request",
0,
"internal",
700357,
"last",
[
"2019-09-09T19:17:57.453Z",
"internal",
"2019-09-09T19:17:56.949Z",
"internal",
"2019-09-09T19:17:56.448Z"
,"internal",...

We have found that one or more tlogs have become corrupt. It appears that the 
CDCR keeps trying to send data, but cannot read the data from the tlog and then 
it retrys, forever.
How does this happen?  It seems to be very frequent, on a weekly basis and 
difficult to trouble shoot
Today we had it happen with one of our collections. Here is the listing for the 
tlog files:

$ ls -alht
total 604M
drwxr-xr-x 2 apache apache  44K Sep  9 14:27 .
-rw-r--r-- 1 apache apache 6.7M Sep  6 19:44 
tlog.766.1643975309914013696
-rw-r--r-- 1 apache apache  35M Sep  6 19:43 
tlog.765.1643975245907886080
-rw-r--r-- 1 apache apache  30M Sep  6 19:42 
tlog.764.1643975182924120064
-rw-r--r-- 1 apache apache  37M Sep  6 19:41 
tlog.763.1643975118316109824
-rw-r--r-- 1 apache apache  19M Sep  6 19:40 
tlog.762.1643975053918863360
-rw-r--r-- 1 apache apache  21M Sep  6 19:39 
tlog.761.1643974989726089216
-rw-r--r-- 1 apache apache  21M Sep  6 19:38 
tlog.760.1643974926010417152
-rw-r--r-- 1 apache apache  29M Sep  6 19:37 
tlog.759.1643974862567374848
-rw-r--r-- 1 apache apache 6.2M Sep  6 19:10 
tlog.758.1643973174027616256
-rw-r--r-- 1 apache apache 228K Sep  5 19:48 
tlog.757.1643885009483857920
-rw-r--r-- 1 apache apache  27M Sep  5 19:48 
tlog.756.1643884946565103616
-rw-r--r-- 1 apache apache  35M Sep  5 19:47 
tlog.755.1643884877912735744
-rw-r--r-- 1 apache apache  30M Sep  5 19:46 
tlog.754.1643884812724862976
-rw-r--r-- 1 apache apache  25M Sep  5 19:45 
tlog.753.1643884748976685056
-rw-r--r-- 1 apache apache  18M Sep  5 19:44 
tlog.752.1643884685794738176
-rw-r--r-- 1 apache apache  21M Sep  5 19:43 
tlog.751.1643884621330382848
-rw-r--r-- 1 apache apache  16M Sep  5 19:42 
tlog.750.1643884558054064128
-rw-r--r-- 1 apache apache  26M Sep  5 19:41 
tlog.749.1643884494725316608
-rw-r--r-- 1 apache apache 5.8M Sep  5 19:12 
tlog.748.1643882681969147904
-rw-r--r-- 1 apache apache  31M Sep  4 19:56 
tlog.747.1643794877229563904
-rw-r--r-- 1 apache apache  31M Sep  4 19:55 
tlog.746.1643794813706829824
-rw-r--r-- 1 apache apache  30M Sep  4 19:54 
tlog.745.1643794749615767552
-rw-r--r-- 1 apache apache  22M Sep  4 19:53 
tlog.744.1643794686253465600
-rw-r--r-- 1 apache apache  18M Sep  4 19:52 
tlog.743.1643794622319689728
-rw-r--r-- 1 apache apache  21M Sep  4 19:51 
tlog.742.1643794558055612416
-rw-r--r-- 1 apache apache  15M Sep  4 19:50 
tlog.741.1643794493330161664
-rw-r--r-- 1 apache apache  26M Sep  4 19:49 
tlog.740.1643794428790308864
-rw-r--r-- 1 apache apache  11M Sep  4 14:58 
tlog.737.1643701398824550400
drwxr-xr-x 5 apache apache   53 Aug 21 06:30 ..
[apache@dfw-pauth-msc01 tlog]$ ls -alht 
tlog.757.1643885009483857920
-rw-r--r-- 1 apache apache 228K Sep  5 19:48 
tlog.757.1643885009483857920
$ date
Mon Sep  9 14:27:31 CDT 2019
$ pwd
/var/solr/data/sial-catalog-product-20190824_shard1_replica_n1/data/tlog

CDCR started replicating after we deleted the oldest tlog file and restarted 
CDCR
tlog.737.1643701398824550400

About the same time I found a number of errors in the solr logs like this:
2019-09-04 19:58:01.393 ERROR 
(recoveryExecutor-162-thread-1-processing-n:dfw-pauth-msc01:8983_solr 
x:sial-catalog-product-20190824_shard1_replica_n1 s:shard1 
c:sial-catalog-product-20190824 r:core_node3) [c:sial-catalog-product-20190824 
s:shard1 r:core_node3 x:sial-catalog-product-20190824_shard1_replica_n1] 
o.a.s.u.UpdateLog java.lang.ClassCastException

This was the most common error at the time, I saw it for all of our collections
2019-09-04 19:57:46.572 ERROR (qtp1355531311-20) 
[c:sial-catalog-product-20190824 s:shard1 r:core_node3 
x:sial-catalog-product-20190824_shard1_replica_n1] o.a.s.h.RequestHandlerBase 

Re: Incremental export of a huge collection

2019-09-09 Thread Mikhail Khludnev
Isn't _version_ a timestamp of insertion by default?

On Mon, Sep 9, 2019 at 9:47 PM Vidit Asthana 
wrote:

> Hi,
>
> I am building a service where I have to continously read data from a Solr
> collection and insert it into another database. Collection will receive
> daily updates. Initial size of collection is very large. After I have
> indexed whole data(through cursor mark), on daily basis I want to only do
> incremental inserts.
>
> My documents don't have anything like timestamp which I can use to fetch
> "only newly added" documents after a certain point. Is there any internal
> field which I can use to create this checkpoint and then later use that to
> fetch "only incremental updates" from that point onwards?
>
> I initially tried to sort the document by ID and use last fetched cursor
> mark, but my unique-ID field is a string and there is NO guarantee that
> newly added document's ID will be in sorted order.
>
> Solr version is 8.2.0.
>
> Regards,
> Vidit
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Incremental export of a huge collection

2019-09-09 Thread Toke Eskildsen
Vidit Asthana  wrote:
> My documents don't have anything like timestamp which I can use to fetch
> "only newly added" documents after a certain point. Is there any internal
> field which I can use to create this checkpoint and then later use that to
> fetch "only incremental updates" from that point onwards?

You could have a timestamped field that is auto-set to the time of indexing the 
document:

  

where date is a solr.DatePointField.


There is a warning in the API about doing that in SolrCloud, so use with care 
or use the TimestampUpdateProcessorFactory that is mentioned:
http://lucene.apache.org/solr/7_2_1/solr-core/org/apache/solr/schema/DatePointField.html


- Toke Eskildsen


Incremental export of a huge collection

2019-09-09 Thread Vidit Asthana
Hi,

I am building a service where I have to continously read data from a Solr
collection and insert it into another database. Collection will receive
daily updates. Initial size of collection is very large. After I have
indexed whole data(through cursor mark), on daily basis I want to only do
incremental inserts.

My documents don't have anything like timestamp which I can use to fetch
"only newly added" documents after a certain point. Is there any internal
field which I can use to create this checkpoint and then later use that to
fetch "only incremental updates" from that point onwards?

I initially tried to sort the document by ID and use last fetched cursor
mark, but my unique-ID field is a string and there is NO guarantee that
newly added document's ID will be in sorted order.

Solr version is 8.2.0.

Regards,
Vidit


Re: Migrating Bounding box from Lucene to Solr

2019-09-09 Thread David Smiley
Hi Amjad,

As you've seen from the ref guide, an arbitrary rectangle query *is*
supported.  Your query looks fine, though I can't be sure if the particular
shape/coordinates are what you intend.  You have a horizontal line in the
vicinity of the US state of Oklahoma.  Your data, on the other hand, is in
the UK.  It's also unclear what field type you are using.  If you have a
polygon then use RptWithGeometrySpatialField and provide it as-such using
either WKT or GeoJSON.  Supplying a list of points runs the risk that the
query won't actually intersect those points.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Sep 9, 2019 at 10:10 AM Amjad Khan  wrote:

> Hi,
> I am migrating my code from Lucene to Solr and stuck on bounding box query.
>
> As per lucene we had this query below
>
> IndexService regionIndex = 
> IndexServiceFactory.getIndexService("HotelRegionIndexService");
>
> AbstractQuerySearcher querySearcher = regionIndex.getSearcher();
>
> SpatialArgs spatialArgs = new SpatialArgs(SpatialOperation.Intersects, shape);
> Filter filter = querySearcher.getSpatialStrategy().makeFilter(spatialArgs);
>
> List regionResults = querySearcher.executeQuery(new 
> MatchAllDocsQuery(), filter,
>   HotelSearchConfig.getMaxRegionsInBoundingBox(), null, null);
>
>
> However in solr bounding box takes only center lat Lon and radius, that
> does not work since our client pass us 4 coordinates and I want to make it
> backward compatible.
>
> So found in solr document to use range query
> https://lucene.apache.org/solr/guide/8_1/spatial-search.html#filtering-by-an-arbitrary-rectangle
>
> But even after using this we are not getting data return by solr query
>
> As example select?q=*:*&fq=POLYGON_VERTICES:[35,-96+TO+36,-95] did not
> return any record.
>
> We have data in solrcloud in this format below
>
>
>
>
> Any help will be appreciated.
>
> Thanks
>


Re: SQL data import handler

2019-09-09 Thread Friscia, Michael
Thank you for your responses Vadim and Jörn. You both prompted me to try again 
and this time I succeeded. The trick seemed to be the way that I installed Java 
using Open JDK versus from Oracle. In addition, I imagine I accidentally had a 
lot of old versions of JAR files lying around so it was easier to start with a 
fresh VM. Now I was able to install using JDK12 and the latest Microsoft 7.4.x 
driver. Now it works out of the box as I wanted. 

Thanks again for being a sounding board for this, I primarily support 
Microsoft/dot net stuff so the Linux stuff sometimes gets away from me.

___
Michael Friscia
Office of Communications
Yale School of Medicine
(203) 737-7932 - office
(203) 931-5381 - mobile
http://web.yale.edu 
 

On 9/9/19, 6:53 AM, "Vadim Ivanov"  wrote:

Hi,
Latest jdbc driver 7.4.1 seems to support JRE 8, 11, 12

https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.microsoft.com%2Fen-us%2Fdownload%2Fdetails.aspx%3Fid%3D58505&data=02%7C01%7Cmichael.friscia%40yale.edu%7C93626e2acbd4457d7f1608d73513f44d%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637036232130960752&sdata=3bLoGx8DzsAifCW9tv64V1sCeS7mTzFU3fAazODNGYE%3D&reserved=0
You have to delete all previous versions of Sql Server jdbc driver from 
Solr installation (/solr/server/lib/ in my case)

-- 
Vadim

> -Original Message-
> From: Friscia, Michael [mailto:michael.fris...@yale.edu]
> Sent: Monday, September 09, 2019 1:22 PM
> To: solr-user@lucene.apache.org
> Subject: SQL data import handler
> 
> I setup SOLR on Ubuntu 18.04 and installed Java from apt-get with 
default-jre
> which installed version 11. So after a day of trying to make my Microsoft 
SQL
> Server data import handler work and failing, I built a new VM and 
installed
> JRE 8 and then everything works perfectly.
> 
> The root of the problem was the elimination of java.bind.xml in JRE 9. 
I’m not
> a Java programmer so I’m only going by what I uncovered digging through 
the
> error logs. I am not positive this is the only error to deal with, for 
all I know
> fixing that will just uncover something else that needs repair. There were
> solutions where you compile SOLR using Maven but this is moving out of my
> comfort zone as well as long term strategy to keep SOLR management (as 
well
> as other Linux systems management) out-of-the-box. There were also
> solutions to include some sort of dependency on this older library but 
I’m at a
> loss on how to relate that to a SOLR install.
> 
> My questions, since I am not that familiar with Java dependencies:
> 
>   1.  Is it ok to run JRE 8 on a production server? It’s heavily 
firewalled and
> SOLR, Zookeeper nor anything else on these servers is available off the 
virtual
> network so it seems ok, but I try not to run very old versions of any 
software.
>   2.  Is there a way to fix this and keep the installation out-of-the-box 
or at
> least almost out of the box?
> 
> ___
> Michael Friscia
> Office of Communications
> Yale School of Medicine
> (203) 737-7932 - office
> (203) 931-5381 - mobile
> 
https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fweb.yale.edu&data=02%7C01%7Cmichael.friscia%40yale.edu%7C93626e2acbd4457d7f1608d73513f44d%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637036232130960752&sdata=G5xMXdGQs12oK%2FDCxKy0zIn8sQ0uCpDLRGGatw45oiY%3D&reserved=0






Limitations of StempelStemmer

2019-09-09 Thread Maciej Gawinecki
Hi,

I have just checked out the latest version of Lucene from Git master branch.

I have tried to stem a few words using StempelStemmer for Polish.
However, it looks it cannot handle some words properly, e.g.

joyce -> ąć
wielce -> ąć
piwko -> ąć
royce -> ąć
pip -> ąć
xyz -> xyz

1. I surprised it cannot handle Polish words like wielce, piwko and
royce. Is this a limitation of the stemming algorithm or a training of
the algorithm or something else? The latter would help improve the
situation. How can I improve that behaviour?
2. I am surprised that for non-Polish words it returns "ać". I would
expect that for words it has not be trained for it will return their
original forms, as it happens, for instance, when stemming words like
"xyz".

With kind regards,
Maciej Gawinecki

Here's minimal example to reproduce the issue:

package org.apache.lucene.analysis;

import java.io.InputStream;
import org.apache.lucene.analysis.stempel.StempelStemmer;

public class Try {

  public static void main(String[] args) throws Exception {
InputStream stemmerTabke = ClassLoader.getSystemClassLoader()
.getResourceAsStream("org/apache/lucene/analysis/pl/stemmer_2.tbl");
StempelStemmer stemmer = new StempelStemmer(stemmerTabke);
String[] words = {"joyce", "wielce", "piwko", "royce", "pip", "xyz"};
for (String word : words) {
  System.out.println(String.format("%s -> %s", word,
stemmer.stem("piwko")));
}

  }

}


Limitations of StempelStemmer

2019-09-09 Thread Maciej Gawinecki
Hi,

I have just checked out the latest version of Lucene from Git master branch.

I have tried to stem a few words using StempelStemmer for Polish.
However, it looks it cannot handle some words properly, e.g.

joyce -> ąć
wielce -> ąć
piwko -> ąć
royce -> ąć
pip -> ąć
xyz -> xyz

1. I surprised it cannot handle Polish words like wielce, piwko and
royce. Is this a limitation of the stemming algorithm or a training of
the algorithm or something else? The latter would help improve the
situation. How can I improve that behaviour?
2. I am surprised that for non-Polish words it returns "ać". I would
expect that for words it has not be trained for it will return their
original forms, as it happens, for instance, when stemming words like
"xyz".

With kind regards,
Maciej Gawinecki

Here's minimal example to reproduce the issue:

package org.apache.lucene.analysis;

import java.io.InputStream;
import org.apache.lucene.analysis.stempel.StempelStemmer;

public class Try {

  public static void main(String[] args) throws Exception {
InputStream stemmerTabke = ClassLoader.getSystemClassLoader()
.getResourceAsStream("org/apache/lucene/analysis/pl/stemmer_2.tbl");
StempelStemmer stemmer = new StempelStemmer(stemmerTabke);
String[] words = {"joyce", "wielce", "piwko", "royce", "pip", "xyz"};
for (String word : words) {
  System.out.println(String.format("%s -> %s", word,
stemmer.stem("piwko")));
}

  }

}


RE: SQL data import handler

2019-09-09 Thread Vadim Ivanov
Hi,
Latest jdbc driver 7.4.1 seems to support JRE 8, 11, 12
https://www.microsoft.com/en-us/download/details.aspx?id=58505
You have to delete all previous versions of Sql Server jdbc driver from Solr 
installation (/solr/server/lib/ in my case)

-- 
Vadim

> -Original Message-
> From: Friscia, Michael [mailto:michael.fris...@yale.edu]
> Sent: Monday, September 09, 2019 1:22 PM
> To: solr-user@lucene.apache.org
> Subject: SQL data import handler
> 
> I setup SOLR on Ubuntu 18.04 and installed Java from apt-get with default-jre
> which installed version 11. So after a day of trying to make my Microsoft SQL
> Server data import handler work and failing, I built a new VM and installed
> JRE 8 and then everything works perfectly.
> 
> The root of the problem was the elimination of java.bind.xml in JRE 9. I’m not
> a Java programmer so I’m only going by what I uncovered digging through the
> error logs. I am not positive this is the only error to deal with, for all I 
> know
> fixing that will just uncover something else that needs repair. There were
> solutions where you compile SOLR using Maven but this is moving out of my
> comfort zone as well as long term strategy to keep SOLR management (as well
> as other Linux systems management) out-of-the-box. There were also
> solutions to include some sort of dependency on this older library but I’m at 
> a
> loss on how to relate that to a SOLR install.
> 
> My questions, since I am not that familiar with Java dependencies:
> 
>   1.  Is it ok to run JRE 8 on a production server? It’s heavily firewalled 
> and
> SOLR, Zookeeper nor anything else on these servers is available off the 
> virtual
> network so it seems ok, but I try not to run very old versions of any 
> software.
>   2.  Is there a way to fix this and keep the installation out-of-the-box or 
> at
> least almost out of the box?
> 
> ___
> Michael Friscia
> Office of Communications
> Yale School of Medicine
> (203) 737-7932 - office
> (203) 931-5381 - mobile
> http://web.yale.edu




Re: SQL data import handler

2019-09-09 Thread Jörn Franke
Hi Michael,

Thank you for sharing. You are right about your approach to not customize the 
distribution.

Solr supports JDK8 and it latest versions (8.x) also JDK11. I would not 
recommend to use it with JDK9 or JDK10 as they are out of support in many Java 
distributions. It might be also that your database driver does not support JDK9 
(check with Microsoft).
I don’t see it that critical at the moment to have JDK8 on this production 
server, but since it is out of support you should look for alternatives.

So if you are with Solr 8.x please go with JDK11 to have the latest fixes etc.

Best regards 

> Am 09.09.2019 um 12:21 schrieb Friscia, Michael :
> 
> I setup SOLR on Ubuntu 18.04 and installed Java from apt-get with default-jre 
> which installed version 11. So after a day of trying to make my Microsoft SQL 
> Server data import handler work and failing, I built a new VM and installed 
> JRE 8 and then everything works perfectly.
> 
> The root of the problem was the elimination of java.bind.xml in JRE 9. I’m 
> not a Java programmer so I’m only going by what I uncovered digging through 
> the error logs. I am not positive this is the only error to deal with, for 
> all I know fixing that will just uncover something else that needs repair. 
> There were solutions where you compile SOLR using Maven but this is moving 
> out of my comfort zone as well as long term strategy to keep SOLR management 
> (as well as other Linux systems management) out-of-the-box. There were also 
> solutions to include some sort of dependency on this older library but I’m at 
> a loss on how to relate that to a SOLR install.
> 
> My questions, since I am not that familiar with Java dependencies:
> 
>  1.  Is it ok to run JRE 8 on a production server? It’s heavily firewalled 
> and SOLR, Zookeeper nor anything else on these servers is available off the 
> virtual network so it seems ok, but I try not to run very old versions of any 
> software.
>  2.  Is there a way to fix this and keep the installation out-of-the-box or 
> at least almost out of the box?
> 
> ___
> Michael Friscia
> Office of Communications
> Yale School of Medicine
> (203) 737-7932 - office
> (203) 931-5381 - mobile
> http://web.yale.edu
> 


SQL data import handler

2019-09-09 Thread Friscia, Michael
I setup SOLR on Ubuntu 18.04 and installed Java from apt-get with default-jre 
which installed version 11. So after a day of trying to make my Microsoft SQL 
Server data import handler work and failing, I built a new VM and installed JRE 
8 and then everything works perfectly.

The root of the problem was the elimination of java.bind.xml in JRE 9. I’m not 
a Java programmer so I’m only going by what I uncovered digging through the 
error logs. I am not positive this is the only error to deal with, for all I 
know fixing that will just uncover something else that needs repair. There were 
solutions where you compile SOLR using Maven but this is moving out of my 
comfort zone as well as long term strategy to keep SOLR management (as well as 
other Linux systems management) out-of-the-box. There were also solutions to 
include some sort of dependency on this older library but I’m at a loss on how 
to relate that to a SOLR install.

My questions, since I am not that familiar with Java dependencies:

  1.  Is it ok to run JRE 8 on a production server? It’s heavily firewalled and 
SOLR, Zookeeper nor anything else on these servers is available off the virtual 
network so it seems ok, but I try not to run very old versions of any software.
  2.  Is there a way to fix this and keep the installation out-of-the-box or at 
least almost out of the box?

___
Michael Friscia
Office of Communications
Yale School of Medicine
(203) 737-7932 - office
(203) 931-5381 - mobile
http://web.yale.edu



Re: Issue with delete

2019-09-09 Thread Jayadevan Maymala
On Mon, Sep 9, 2019 at 11:11 AM Jörn Franke  wrote:

> Do you commit after running the delete?
>

The  "commit=true" part would take care of that?



> > Am 09.09.2019 um 06:59 schrieb Jayadevan Maymala <
> jayade...@ftltechsys.com>:
> >
> > Hello All,
> >
> > I have a 3-node Solr cluster using a 3-node Zoookeeper system. Solr
> Version
> > is 7.3.0. We have batch deletes which were working a few days ago. All
> of a
> > sudden, they stopped working (I did run a yum update on the client
> machine
> > - not sure if it did anything to the Guzzle client). The delete is sent
> via
> > GuzzleHttp client from Lumen (php Microservices) framework. The delete
> > request reaches the Solr servers all right - here is from the log -
> > (qtp434091818-167594) [c:paymetryproducts s:shard1 r:core_node4
> > x:paymetryproducts_shard1_replica_n2] o.a.s.u.p.LogUpdateProcessorFactory
> > [paymetryproducts_shard1_replica_n2]  webapp=/solr path=/update
> >
> params={stream.body=category_id:"*5812b8c81874e142b86fbb0e*"&commit=true&wt=json}{deleteByQuery=category_id:"5812b8c81874e142b86fbb0e"
> > (-1644171174075170816),commit=} 0 3695
> > I tried setting both 'json' and 'xml' wt types. A dump of the response on
> > the client gives me only this -
> >
> > (
> >[stream:GuzzleHttp\Psr7\Stream:private] => Resource id #403
> >[size:GuzzleHttp\Psr7\Stream:private] =>
> >[seekable:GuzzleHttp\Psr7\Stream:private] => 1
> >[readable:GuzzleHttp\Psr7\Stream:private] => 1
> >[writable:GuzzleHttp\Psr7\Stream:private] => 1
> >[uri:GuzzleHttp\Psr7\Stream:private] => php://temp
> >[customMetadata:GuzzleHttp\Psr7\Stream:private] => Array
> >(
> >)
> >
> > )
> >
> > If I execute the delete from Solr Admin panel, it works.  The query I am
> > executing from Admin to check if the data was deleted is this  (I am
> > forwarding Solr port to local machine).
> > http://127.0.0.1:8993/solr/paymetryproducts/select?q=category_id%20:%22
> > *5812b8c81874e142b86fbb0e*%22
> >
> > Regards,
> > Jayadevan
>


Re: Suggestion Needed: Exclude documents that are already served / viewed by a customer

2019-09-09 Thread Doss
Hi Experts,

We are migrating our entire search platform from SPHINX to SOLR, we wanted
to do this without any flaw so any suggestion would be greatly appreciated.

Thanks!


On Fri, Sep 6, 2019 at 11:13 AM Doss  wrote:

> Dear Experts,
>
> For a matchmaking portal, we have one requirement where in, if a customer
> viewed complete details of a bride or groom then we have to exclude that
> profile id from further search results. Currently, along with other details
> we are storing the viewed profile ids in a field (multivalued field)
> against that bride or groom's details.
>
> Eg., if A viewed B, then in B's document under the field saw_me we will
> add A's id
>
> while searching, lets say, the currently searching members id is 123456
> then we will fire a query like
>
> fq=-saw_me:(123456)
>
> Problem #1: The saw_me field value is growing like anything.
> Problem #2: Removal of ids which are deleted from the base. Right now we
> are doing this job as follows
>Query #1: fq=saw_me:(123456)&fl=DocId //Get all document ids
> which has the deleted id as part of saw_me field.
>Query #2: {"DociId":"234567","saw_me":{"remove":"123456"}
> //loop through the results got through the 1st query and fire the update
> query one by one
>
> We feel that this method of handling is not that optimum, so we need
> expert advice. Please guide.
>


Re: Block Join Queries parsed incorrectly

2019-09-09 Thread MUNENDRA S N
This change was done in Solr-7.2. SOLR-11501
 has the details about
the change

Regards,
Munendra S N



On Mon, Sep 9, 2019 at 1:16 PM Enna Raerinne (TAU) 
wrote:

> Hi!
>
> I've been using block join queries with Solr version 7.1 and with request
> handler where defType is edismax and everything has worked fine. I recently
> updated my Solr to 8.1 and updated the luceneMatchVersion also to 8.1 in
> solrconfig.xml. However, now my block join queries don't work anymore and
> when I debugged the issue it seems that my block join queries are not
> parsed correctly when edismax is used as default parser and if
> luceneMatchVersion is 8.1. I searched everywhere I could think of, but
> didn't find out why the parsing has been changed. Is it intentional or a
> bug or am I using Solr the wrong way?
>
> Thanks,
> Enna Raerinne
>


Block Join Queries parsed incorrectly

2019-09-09 Thread Enna Raerinne (TAU)
Hi!

I've been using block join queries with Solr version 7.1 and with request 
handler where defType is edismax and everything has worked fine. I recently 
updated my Solr to 8.1 and updated the luceneMatchVersion also to 8.1 in 
solrconfig.xml. However, now my block join queries don't work anymore and when 
I debugged the issue it seems that my block join queries are not parsed 
correctly when edismax is used as default parser and if luceneMatchVersion is 
8.1. I searched everywhere I could think of, but didn't find out why the 
parsing has been changed. Is it intentional or a bug or am I using Solr the 
wrong way?

Thanks,
Enna Raerinne


Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-09 Thread Toke Eskildsen
On Sun, 2019-09-08 at 21:10 +0200, Günter Hipler wrote:
> I have seen you have done a lot of work at the end of version 7 for 
> version 8 but was not sure if it is related to this issue.

It is directly related to DocValues, but it is unclear if that is
Russell's challenge. Our setup is at the far end with 300M docs / 900GB
segments. With Solr 7, performance dropped with a factor 10 for
standard searches (as we used populated documents from DocValues
instead of Stored) and a factor 100 for worst-case exports.


Long story short

* Solr 4-6 has random access DocValues
* Solr 7 has iterator based (all previous blocks must be visited to get
to the block containing the needed value)
* Solr 8 has iterator based with skip lists (go directly to the needed
block)


Looking back in the list, I see that you had performance problems with
faceting on Solr 6 too. I'll make a note to read up on that and respond
in that thread, to avoid hi-jacking this one. It probably won't be this
week as Real Work is heating up.

- Toke Eskildsen, Royal Danish Library