Error when configuring reclaimDeletesWeight in TieredMergedPolicyFactory

2019-01-24 Thread Zheng Lin Edwin Yeo
Hi,

I am using Solr 7.5.0, and I came across this presentation (
https://www.slideshare.net/sematext/solr-search-engine-optimize-is-not-bad-for-you)
on Solr Search Engine: Optimize Is (Not) Bad for You.
>From Slide 59, it touch on the settings on reclaimDeletesWeight.

I have tried to follow their example and configured the following
TieredMergedPolicyFactory in my solrconfig.xml.

  10
  10
  10
  10
  5000
  0.1
  2048
  2.0
  10.0


However, when I load in the configuration, I get the following error.

Caused by: java.lang.RuntimeException: No setter corrresponding to
'reclaimDeletesWeight' in org.apache.lucene.index.TieredMergePolicy
at 
org.apache.solr.util.SolrPluginUtils.findSetter(SolrPluginUtils.java:1051)
at 
org.apache.solr.util.SolrPluginUtils.invokeSetters(SolrPluginUtils.java:1011)
at 
org.apache.solr.util.SolrPluginUtils.invokeSetters(SolrPluginUtils.java:1000)
at 
org.apache.solr.index.MergePolicyFactoryArgs.invokeSetters(MergePolicyFactoryArgs.java:58)
at 
org.apache.solr.index.SimpleMergePolicyFactory.getMergePolicy(SimpleMergePolicyFactory.java:38)
at 
org.apache.solr.update.SolrIndexConfig.buildMergePolicy(SolrIndexConfig.java:281)
at 
org.apache.solr.update.SolrIndexConfig.toIndexWriterConfig(SolrIndexConfig.java:230)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:125)
at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:97)
at 
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:257)
at 
org.apache.solr.update.DefaultSolrCoreState.changeWriter(DefaultSolrCoreState.java:220)
at 
org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:229)
at org.apache.solr.core.SolrCore.reload(SolrCore.java:669)
at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1393)
... 47 more


What could be the reason that caused the error?

Regards,
Edwin


Question about IndexSearcher.search()

2019-01-24 Thread NDelt
Hello.
I'm trying to make sample search application using Lucene.

search() method of IndexSearch class searches documents with query, and
returns TopDocs instance. TopDocs instance includes array of ScoreDoc
instances. My Question is:

1. Will query be tokenized during a search?
2. If so, does ScoreDoc instance in the array have only one token's hit?


Re: API to convert a SolrInputDocument to JSON

2019-01-24 Thread Shawn Heisey

On 1/24/2019 5:06 PM, Pushkar Raste wrote:

May be my questions wasn’t clear. By issues I meant will Solrj client for
7.x work to index documents in Solr 4.10 or vice versa.


With HttpSolrClient, I would generally expect very good compatibility 
from 7.x to 4.x.  I have done it with no problems.


It would be a bad idea to try such a wide version gap with CloudSolrClient.

For all clients, when the versions are different, it is better to use a 
newer client.


Thanks,
Shawn


Re: API to convert a SolrInputDocument to JSON

2019-01-24 Thread Pushkar Raste
May be my questions wasn’t clear. By issues I meant will Solrj client for
7.x work to index documents in Solr 4.10 or vice versa.

I am OK to use HttpSolrClient

On Wed, Jan 23, 2019 at 9:33 PM Erick Erickson 
wrote:

> Walter:
>
> Don't know if it helps, but have you looked at:
> https://issues.apache.org/jira/browse/SOLR-445
>
> I have _not_ worked with this personally in prod SolrCloud systems, so
> I can't say much more
> than it exists. It's only available in Solr 6.1+
>
> Best,
> Erick
>
> On Wed, Jan 23, 2019 at 5:55 PM Pushkar Raste 
> wrote:
> >
> > You mean I can use SolrJ 7.x for both indexing documents to both Solr 4
> and
> > Solr 7 as well as the SolrInputDocument class from Solrj 7.x
> >
> > Wouldn’t there be issues if there are any backwards incompatible changes.
> >
> > On Wed, Jan 23, 2019 at 8:09 PM Shawn Heisey 
> wrote:
> >
> > > On 1/23/2019 5:49 PM, Pushkar Raste wrote:
> > > > Thanks for the quick response Shawn. It is migrating ion from Solr
> 4.10
> > > > master/slave to Solr Cloud 7.x
> > >
> > > In that case, use SolrJ 7.x, with CloudSolrClient to talk to the new
> > > version and HttpSolrClient to talk to the old version. Use the same
> > > SolrInputDocument objects for both.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
>


Re: CDCR "all" collections

2019-01-24 Thread Erick Erickson
Bram:

Hmmm You can't do that OOB right now, but it might not be a hard thing to add.

The current configuration allows the source collection to have a
different name than the
target collection so if you could make the assumption that the two
collections always had
the same name, it might be trivial.

WARNING! this is something that just occurred to me. I have NOT
thought it through,
but if it works it'd be very cool ;)

How brave do you feel? This _might_ be totally trivial. I'm looking at
the current trunk, but
in CdcrReplicationManager, line 97 looks like this:

String targetCollection = params.get(CdcrParams.TARGET_COLLECTION_PARAM);

It _might_ (and again, I have NOT explored this in detail) be as
simple as adding
after that line:

if (targetCollection == null) {
targetCollection = params.get(CdcrParams.SOURCE_COLLECTION_PARAM);
}

or similar. Then leave

collection1

out of the solrconfig file.

While the code change is trivial, the work is in verifying that it
works and I'm afraid
I don't personally have the time to do that verification, but I'd be
glad to commit if
if someone else does and submits a patch, including at least one unit test.

The tricky parts would be insuring nothing bad happens if, for
instance, the target
collection never got created, making sure the tlogs didn't grow, that
kind of thing.

Best,
Erick

On Thu, Jan 24, 2019 at 3:51 AM Bram Van Dam  wrote:
>
> Hey folks,
>
> Is there any way to set up CDCR for *all* collections, including any
> newly created ones? Having to modify the solrconfig in ZK every time a
> collection is added is a bit of a pain, especially because I'm assuming
> it requires a restart to activate the config?
>
> Basically if I have DC Src and DC Tgt, I want every collection from Src
> to be replicated to Tgt. Even when I create a new collection on Src.
>
> Thanks,
>
>  - Bram


Re: [SPAM] Re: Per-field slop param in eDisMax

2019-01-24 Thread David Hastings
Also the order matters, it would be a different result set than
"a tnf"~2

On Thu, Jan 24, 2019 at 10:53 AM David Hastings <
hastings.recurs...@gmail.com> wrote:

> it allows two words or less to be matched in a phrase in-between "tnf" and
> "a"
> so it will match
> "tnf a"
> "tnf aword1 a"
> "tnf aword1 aword2 a"
>
> On Thu, Jan 24, 2019 at 10:45 AM Danilo Tomasoni 
> wrote:
>
>> And what does
>>
>> q: f2:"tnf α"~2
>>
>> f.f2.qf:  titles study_brief_title
>>
>>
>> means with edismax?
>>
>>
>> it raises different results from
>>
>> q: f2:"tnf α"
>>
>>
>> On 24/01/19 14:51, Elizabeth Haubert wrote:
>> > To do this you specify the slop on each field when you specify the
>> > pf/pf2/pf3 parameters:
>> > pf:fieldA~2 fieldB~5
>> >
>> > I'll try to add an example to the documentation here:
>> >
>> https://lucene.apache.org/solr/guide/7_6/the-extended-dismax-query-parser.html#using-slop
>> >
>> > Elizabeth
>> >
>> > On Wed, Jan 23, 2019 at 10:30 PM Yasufumi Mizoguchi <
>> yasufumi0...@gmail.com>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> I am struggling to set per-field slop param in eDisMax query parser
>> with
>> >> Solr 6.0 and 7.6.
>> >> What I want to do with eDixMax is similar to following in the default
>> query
>> >> parser.
>> >>
>> >> * Query string : "aaa bbb"
>> >> * Target fields : fieldA(TextField), fieldB(TextField)
>> >>
>> >> q=fieldA:"aaa bbb"~2 OR fieldB:"aaa bbb"~5
>> >>
>> >> Anyone have good ideas?
>> >>
>> >> Thanks,
>> >> Yasufumi.
>> >>
>> --
>> Danilo Tomasoni
>> COSBI
>>
>> As for the European General Data Protection Regulation 2016/679 on the
>> protection of natural persons with regard to the processing of personal
>> data, we inform you that all the data we possess are object of treatement
>> in the respect of the normative provided for by the cited GDPR.
>>
>> It is your right to be informed on which of your data are used and how;
>> you may ask for their correction, cancellation or you may oppose to their
>> use by written request sent by recorded delivery to The Microsoft Research
>> – University of Trento Centre for Computational and Systems Biology Scarl,
>> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
>>
>>


Re: [SPAM] Re: Per-field slop param in eDisMax

2019-01-24 Thread David Hastings
it allows two words or less to be matched in a phrase in-between "tnf" and
"a"
so it will match
"tnf a"
"tnf aword1 a"
"tnf aword1 aword2 a"

On Thu, Jan 24, 2019 at 10:45 AM Danilo Tomasoni  wrote:

> And what does
>
> q: f2:"tnf α"~2
>
> f.f2.qf:  titles study_brief_title
>
>
> means with edismax?
>
>
> it raises different results from
>
> q: f2:"tnf α"
>
>
> On 24/01/19 14:51, Elizabeth Haubert wrote:
> > To do this you specify the slop on each field when you specify the
> > pf/pf2/pf3 parameters:
> > pf:fieldA~2 fieldB~5
> >
> > I'll try to add an example to the documentation here:
> >
> https://lucene.apache.org/solr/guide/7_6/the-extended-dismax-query-parser.html#using-slop
> >
> > Elizabeth
> >
> > On Wed, Jan 23, 2019 at 10:30 PM Yasufumi Mizoguchi <
> yasufumi0...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I am struggling to set per-field slop param in eDisMax query parser with
> >> Solr 6.0 and 7.6.
> >> What I want to do with eDixMax is similar to following in the default
> query
> >> parser.
> >>
> >> * Query string : "aaa bbb"
> >> * Target fields : fieldA(TextField), fieldB(TextField)
> >>
> >> q=fieldA:"aaa bbb"~2 OR fieldB:"aaa bbb"~5
> >>
> >> Anyone have good ideas?
> >>
> >> Thanks,
> >> Yasufumi.
> >>
> --
> Danilo Tomasoni
> COSBI
>
> As for the European General Data Protection Regulation 2016/679 on the
> protection of natural persons with regard to the processing of personal
> data, we inform you that all the data we possess are object of treatement
> in the respect of the normative provided for by the cited GDPR.
>
> It is your right to be informed on which of your data are used and how;
> you may ask for their correction, cancellation or you may oppose to their
> use by written request sent by recorded delivery to The Microsoft Research
> – University of Trento Centre for Computational and Systems Biology Scarl,
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
>
>


Re: [SPAM] Re: Per-field slop param in eDisMax

2019-01-24 Thread Danilo Tomasoni

And what does

q: f2:"tnf α"~2

f.f2.qf:  titles study_brief_title


means with edismax?


it raises different results from

q: f2:"tnf α"


On 24/01/19 14:51, Elizabeth Haubert wrote:

To do this you specify the slop on each field when you specify the
pf/pf2/pf3 parameters:
pf:fieldA~2 fieldB~5

I'll try to add an example to the documentation here:
https://lucene.apache.org/solr/guide/7_6/the-extended-dismax-query-parser.html#using-slop

Elizabeth

On Wed, Jan 23, 2019 at 10:30 PM Yasufumi Mizoguchi 
wrote:


Hi,

I am struggling to set per-field slop param in eDisMax query parser with
Solr 6.0 and 7.6.
What I want to do with eDixMax is similar to following in the default query
parser.

* Query string : "aaa bbb"
* Target fields : fieldA(TextField), fieldB(TextField)

q=fieldA:"aaa bbb"~2 OR fieldB:"aaa bbb"~5

Anyone have good ideas?

Thanks,
Yasufumi.


--
Danilo Tomasoni
COSBI

As for the European General Data Protection Regulation 2016/679 on the 
protection of natural persons with regard to the processing of personal data, 
we inform you that all the data we possess are object of treatement in the 
respect of the normative provided for by the cited GDPR.

It is your right to be informed on which of your data are used and how; you may 
ask for their correction, cancellation or you may oppose to their use by 
written request sent by recorded delivery to The Microsoft Research – 
University of Trento Centre for Computational and Systems Biology Scarl, Piazza 
Manifattura 1, 38068 Rovereto (TN), Italy.



Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Zheng Lin Edwin Yeo
Hi Jan,

Thanks for your reply.

However, we are still getting a slow QTime of 517ms even after we set
hl=false=null.

Below is the debug query:

  "debug":{
"rawquerystring":"cherry",
"querystring":"cherry",
"parsedquery":"searchFields_tcs:cherry",
"parsedquery_toString":"searchFields_tcs:cherry",
"explain":{
  "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in
5747763) [SchemaSimilarity], result of:\n  14.227914 =
score(doc=5747763,freq=3.0 = termFreq=3.0\n), product of:\n
9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:\n  400.0 = docFreq\n  600.0 =
docCount\n1.4798305 = tfNorm, computed as (freq * (k1 + 1)) /
(freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
3.0 = termFreq=3.0\n  1.2 = parameter k1\n  0.75 = parameter
b\n  19.397041 = avgFieldLength\n  25.0 = fieldLength\n",
  "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in
4840794) [SchemaSimilarity], result of:\n  13.937909 =
score(doc=4840794,freq=3.0 = termFreq=3.0\n), product of:\n
9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:\n  400.0 = docFreq\n  600.0 =
docCount\n1.4496675 = tfNorm, computed as (freq * (k1 + 1)) /
(freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
3.0 = termFreq=3.0\n  1.2 = parameter k1\n  0.75 = parameter
b\n  19.397041 = avgFieldLength\n  27.0 = fieldLength\n",
"QParser":"LuceneQParser",
"timing":{
  "time":517.0,
  "prepare":{
"time":0.0,
"query":{
  "time":0.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":516.0,
"query":{
  "time":15.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":500.0}

Regards,
Edwin


On Thu, 24 Jan 2019 at 22:43, Jan Høydahl  wrote:

> Looks like highlighting takes most of the time on the first query (680ms).
> You config seems to ask for a lot of highlighting here, like 100 snippets
> of max 10 characters etc.
> Sounds to me that this might be a highlighting configuration problem. Try
> to disable highlighting (hl=false) and see if you get back your speed.
> Also, I see fl=* in your config, which is really asking for all fields.
> Are you sure you want that, that may also be slow. Try to ask for just the
> fields you will be using.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo  >:
> >
> > Thanks for your reply.
> >
> > Below are what you have requested about our Solr setup, configurations
> > files ,schema and results of debug queries:
> >
> > Looking forward to your advice and support on our problem.
> >
> > 1. System configurations
> > OS: Windows 10 Pro 64 bit
> > System Memory: 32GB
> > CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
> > Processor(s)
> > HDD: 3.0 TB (free 2.1 TB)  SATA
> >
> > 2. solrconfig.xml of customers and policies collection, and solr.in,cmd
> > which can be download from the following link:
> >
> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
> >
> > 3. The debug queries from both collections
> >
> > *3.1. Debug Query From Policies ( which is Slow)*
> >
> >  "debug":{
> >
> >"rawquerystring":"sherry",
> >
> >"querystring":"sherry",
> >
> >"parsedquery":"searchFields_tcs:sherry",
> >
> >"parsedquery_toString":"searchFields_tcs:sherry",
> >
> >"explain":{
> >
> >  "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
> > 3097315) [SchemaSimilarity], result of:\n  14.540428 =
> > score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
> > 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> > (docFreq + 0.5)) from:\n  812.0 = docFreq\n  600.0 =
> > docCount\n1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
> > (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
> > 5.0 = termFreq=5.0\n  1.2 = parameter k1\n  0.75 = parameter
> > b\n  19.397041 = avgFieldLength\n  31.0 = fieldLength\n”,..
> >
> >"QParser":"LuceneQParser",
> >
> >"timing":{
> >
> >  "time":681.0,
> >
> >  "prepare":{
> >
> >"time":0.0,
> >
> >"query":{
> >
> >  "time":0.0},
> >
> >"facet":{
> >
> >  "time":0.0},
> >
> >

Re: _version_ field missing in schema?

2019-01-24 Thread Aleksandar Dimitrov
Finally, since you are trying to really tweak the schema and 
general
configuration right from the start, you may find some of my 
presentations
useful, as they show the minimal configuration. Not perfect for 
your needs,
as I do skip _version, but as an additional data point. The 
recent one is:

https://www.slideshare.net/arafalov/rapid-solr-schema-development-phone-directory
and the Git repo is at:
https://github.com/arafalov/solr-presentation-2018-may . This 
one may be

useful as well:
https://www.slideshare.net/arafalov/from-content-to-search-speeddating-apache-solr-apachecon-2018-116330553


Thanks for the pointers. I've finally managed to get my schema to 
work ☺


Cheers,
Aleks



Regards,
   Alex.

On Wed, Jan 23, 2019, 5:50 AM Aleksandar Dimitrov <
a.dimit...@seidemann-web.com wrote:


Hi Alex,

thanks for you answer. I took the lines directly from the
managed-schema, deleted the managed-schema, and pasted those 
lines

into
my schema.xml.

If I have other errors in the schema.xml (such as a missing 
field

type),
solr complains about those until I fix them. So I would guess 
that

the
schema is at least *read*, but unsure if it is in fact used. 
I've

not
used solr before.

I cannot use the admin UI, at least not while the core with the
faulty
schema is used.

I wanted to use schema.xml because it allows for version 
control,

and
because it's easier for me to just use xml to define my schema. 
Is

there
a preferred approach? I don't (want to) use solr cloud, as for 
our

use
case a single instance of solr is more than enough.

Thanks for your help,
Aleks

Alexandre Rafalovitch  writes:

> What do you mean schema.xml from managed-schema? schema.xml 
> is

> old
> non-managed approach. If you have both, schema.xml will be
> ignored.
>
> I suspect you are not running with the schema you think you 
> do.

> You can
> check that with API or in Admin UI if you get that far.
>
> Regards,
> Alex
>
> On Tue, Jan 22, 2019, 11:39 AM Aleksandar Dimitrov <
> a.dimit...@seidemann-web.com wrote:
>
>> Hi,
>>
>> I'm using solr 7.5, in my schema.xml I have this, which I 
>> took

>> from the
>> managed-schema:
>>
>>   
>>   
>>   >   stored="false" />
>>   >   docValues="true" />
>>
>> However, on startup, solr complains:
>>
>>  Caused by: org.apache.solr.common.SolrException: _version_
>>  field
>>  must exist in schema and be searchable (indexed or 
>>  docValues)

>>  and
>>  retrievable(stored or docValues) and not multiValued
>>  (_version_
>>  does not exist)
>>   at
>>
>>
org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:69)
>>
>>   ~[solr-core-7.5.0.jar:7.5.0
>>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>>   2018-09-18 13:07:55]
>>   at
>>   org.apache.solr.update.VersionInfo.(VersionInfo.java:95)
>>   ~[solr-core-7.5.0.jar:7.5.0
>>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>>   2018-09-18 13:07:55]
>>   at
>>   org.apache.solr.update.UpdateLog.init(UpdateLog.java:404)
>>   ~[solr-core-7.5.0.jar:7.5.0
>>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>>   2018-09-18 13:07:55]
>>   at
>>
 org.apache.solr.update.UpdateHandler.(UpdateHandler.java:161)
>>   ~[solr-core-7.5.0.jar:7.5.0
>>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>>   2018-09-18 13:07:55]
>>   at
>>
 org.apache.solr.update.UpdateHandler.(UpdateHandler.java:116)
>>   ~[solr-core-7.5.0.jar:7.5.0
>>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>>   2018-09-18 13:07:55]
>>   at
>>
>>
org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:119)
>>
>>   ~[solr-core-7.5.0.jar:7.5.0
>>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>>   2018-09-18 13:07:55]
>>   at
>>
>> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>   Method) ~[?:?]
>>   at
>>
>>
jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>>
>>   ~[?:?]
>>   at
>>
>>
jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>
>>   ~[?:?]
>>   at
>>   java.lang.reflect.Constructor.newInstance(Constructor.java:488)
>>   ~[?:?]
>>   at
>>   org.apache.solr.core.SolrCore.createInstance(SolrCore.java:799)
>>   ~[solr-core-7.5.0.jar:7.5.0
>>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>>   2018-09-18 13:07:55]
>>   at
>>
 org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:861)
>>   ~[solr-core-7.5.0.jar:7.5.0
>>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>>   2018-09-18 13:07:55]
>>   at
>>
 org.apache.solr.core.SolrCore.initUpdateHandler(SolrCore.java:1114)
>>   ~[solr-core-7.5.0.jar:7.5.0
>>   b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi -
>>   2018-09-18 13:07:55]
>>   at
>>   org.apache.solr.core.SolrCore.(SolrCore.java:984)

Re: _version_ field missing in schema?

2019-01-24 Thread Aleksandar Dimitrov

Shawn Heisey  writes:


On 1/23/2019 3:49 AM, Aleksandar Dimitrov wrote:

Hi Alex,

thanks for you answer. I took the lines directly from the
managed-schema, deleted the managed-schema, and pasted those 
lines into

my schema.xml.


Unless you have changed the solrconfig.xml to refer to the 
classic schema, the

file named schema.xml is not used.


Yup, that was the mistake. I had to use

 

in my solrconfig, and then it worked. I think the classic schema 
factory

should be enough for our use case.

Thanks!
Aleks

With the standard schema factory, on core startup, if schema.xml 
is found, it is
copied to managed-schema and then renamed to a backup filename. 
This would also

happen on reload, I believe.

Recommendation: unless you're using the classic schema, never 
use the schema.xml

file. Only work with managed-schema.

Thanks,
Shawn




Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Jan Høydahl
Looks like highlighting takes most of the time on the first query (680ms). You 
config seems to ask for a lot of highlighting here, like 100 snippets of max 
10 characters etc.
Sounds to me that this might be a highlighting configuration problem. Try to 
disable highlighting (hl=false) and see if you get back your speed.
Also, I see fl=* in your config, which is really asking for all fields. Are you 
sure you want that, that may also be slow. Try to ask for just the fields you 
will be using.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo :
> 
> Thanks for your reply.
> 
> Below are what you have requested about our Solr setup, configurations
> files ,schema and results of debug queries:
> 
> Looking forward to your advice and support on our problem.
> 
> 1. System configurations
> OS: Windows 10 Pro 64 bit
> System Memory: 32GB
> CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
> Processor(s)
> HDD: 3.0 TB (free 2.1 TB)  SATA
> 
> 2. solrconfig.xml of customers and policies collection, and solr.in,cmd
> which can be download from the following link:
> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
> 
> 3. The debug queries from both collections
> 
> *3.1. Debug Query From Policies ( which is Slow)*
> 
>  "debug":{
> 
>"rawquerystring":"sherry",
> 
>"querystring":"sherry",
> 
>"parsedquery":"searchFields_tcs:sherry",
> 
>"parsedquery_toString":"searchFields_tcs:sherry",
> 
>"explain":{
> 
>  "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
> 3097315) [SchemaSimilarity], result of:\n  14.540428 =
> score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
> 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:\n  812.0 = docFreq\n  600.0 =
> docCount\n1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
> (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
> 5.0 = termFreq=5.0\n  1.2 = parameter k1\n  0.75 = parameter
> b\n  19.397041 = avgFieldLength\n  31.0 = fieldLength\n”,..
> 
>"QParser":"LuceneQParser",
> 
>"timing":{
> 
>  "time":681.0,
> 
>  "prepare":{
> 
>"time":0.0,
> 
>"query":{
> 
>  "time":0.0},
> 
>"facet":{
> 
>  "time":0.0},
> 
>"facet_module":{
> 
>  "time":0.0},
> 
>"mlt":{
> 
>  "time":0.0},
> 
>"highlight":{
> 
>  "time":0.0},
> 
>"stats":{
> 
>  "time":0.0},
> 
>"expand":{
> 
>  "time":0.0},
> 
>"terms":{
> 
>  "time":0.0},
> 
>"debug":{
> 
>  "time":0.0}},
> 
>  "process":{
> 
>"time":680.0,
> 
>"query":{
> 
>  "time":19.0},
> 
>"facet":{
> 
>  "time":0.0},
> 
>"facet_module":{
> 
>  "time":0.0},
> 
>"mlt":{
> 
>  "time":0.0},
> 
>"highlight":{
> 
>  "time":651.0},
> 
>"stats":{
> 
>  "time":0.0},
> 
>"expand":{
> 
>  "time":0.0},
> 
>"terms":{
> 
>  "time":0.0},
> 
>"debug":{
> 
>  "time":8.0}},
> 
>  "loadFieldValues":{
> 
>"time":12.0
> 
> 
> 
> *3.2. Debug Query From Customers (which is fast because we index it after
> indexing Policies):*
> 
> 
> 
>  "debug":{
> 
>"rawquerystring":"sherry",
> 
>"querystring":"sherry",
> 
>"parsedquery":"searchFields_tcs:sherry",
> 
>"parsedquery_toString":"searchFields_tcs:sherry",
> 
>"explain":{
> 
>  "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
> 2453665) [SchemaSimilarity], result of:\n  13.191501 =
> score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n9.08604
> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> 0.5)) from:\n  428.0 = docFreq\n  3784142.0 = docCount\n
> 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> b + b * fieldLength / avgFieldLength)) from:\n  3.0 =
> termFreq=3.0\n  1.2 = parameter k1\n  0.75 = parameter b\n
> 20.22558 = avgFieldLength\n  28.0 = fieldLength\n”, ..
> 
>"QParser":"LuceneQParser",
> 
>"timing":{
> 
>  "time":38.0,
> 
>  "prepare":{
> 
>"time":1.0,
> 
>"query":{
> 
>  "time":1.0},
> 
>"facet":{
> 
>  "time":0.0},
> 
>"facet_module":{
> 
>  "time":0.0},
> 
>"mlt":{
> 
>  "time":0.0},
> 
>"highlight":{
> 
>  "time":0.0},
> 
>"stats":{
> 
>  "time":0.0},
> 
>"expand":{
> 
>  "time":0.0},
> 
>"terms":{
> 
>  "time":0.0},
> 
>"debug":{
> 
>  "time":0.0}},
> 
>  "process":{
> 
>"time":36.0,
> 
>"query":{
> 
>  "time":1.0},
> 
>"facet":{
> 
>  "time":0.0},
> 
>"facet_module":{
> 
>  

Re: Per-field slop param in eDisMax

2019-01-24 Thread Elizabeth Haubert
To do this you specify the slop on each field when you specify the
pf/pf2/pf3 parameters:
pf:fieldA~2 fieldB~5

I'll try to add an example to the documentation here:
https://lucene.apache.org/solr/guide/7_6/the-extended-dismax-query-parser.html#using-slop

Elizabeth

On Wed, Jan 23, 2019 at 10:30 PM Yasufumi Mizoguchi 
wrote:

> Hi,
>
> I am struggling to set per-field slop param in eDisMax query parser with
> Solr 6.0 and 7.6.
> What I want to do with eDixMax is similar to following in the default query
> parser.
>
> * Query string : "aaa bbb"
> * Target fields : fieldA(TextField), fieldB(TextField)
>
> q=fieldA:"aaa bbb"~2 OR fieldB:"aaa bbb"~5
>
> Anyone have good ideas?
>
> Thanks,
> Yasufumi.
>


Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Zheng Lin Edwin Yeo
Thanks for your reply.

Below are what you have requested about our Solr setup, configurations
files ,schema and results of debug queries:

Looking forward to your advice and support on our problem.

1. System configurations
OS: Windows 10 Pro 64 bit
System Memory: 32GB
CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
Processor(s)
HDD: 3.0 TB (free 2.1 TB)  SATA

2. solrconfig.xml of customers and policies collection, and solr.in,cmd
which can be download from the following link:
https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing

3. The debug queries from both collections

*3.1. Debug Query From Policies ( which is Slow)*

  "debug":{

"rawquerystring":"sherry",

"querystring":"sherry",

"parsedquery":"searchFields_tcs:sherry",

"parsedquery_toString":"searchFields_tcs:sherry",

"explain":{

  "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
3097315) [SchemaSimilarity], result of:\n  14.540428 =
score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:\n  812.0 = docFreq\n  600.0 =
docCount\n1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
(freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
5.0 = termFreq=5.0\n  1.2 = parameter k1\n  0.75 = parameter
b\n  19.397041 = avgFieldLength\n  31.0 = fieldLength\n”,..

"QParser":"LuceneQParser",

"timing":{

  "time":681.0,

  "prepare":{

"time":0.0,

"query":{

  "time":0.0},

"facet":{

  "time":0.0},

"facet_module":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":0.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"terms":{

  "time":0.0},

"debug":{

  "time":0.0}},

  "process":{

"time":680.0,

"query":{

  "time":19.0},

"facet":{

  "time":0.0},

"facet_module":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":651.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"terms":{

  "time":0.0},

"debug":{

  "time":8.0}},

  "loadFieldValues":{

"time":12.0



*3.2. Debug Query From Customers (which is fast because we index it after
indexing Policies):*



  "debug":{

"rawquerystring":"sherry",

"querystring":"sherry",

"parsedquery":"searchFields_tcs:sherry",

"parsedquery_toString":"searchFields_tcs:sherry",

"explain":{

  "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
2453665) [SchemaSimilarity], result of:\n  13.191501 =
score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n9.08604
= idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
0.5)) from:\n  428.0 = docFreq\n  3784142.0 = docCount\n
1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
b + b * fieldLength / avgFieldLength)) from:\n  3.0 =
termFreq=3.0\n  1.2 = parameter k1\n  0.75 = parameter b\n
 20.22558 = avgFieldLength\n  28.0 = fieldLength\n”, ..

"QParser":"LuceneQParser",

"timing":{

  "time":38.0,

  "prepare":{

"time":1.0,

"query":{

  "time":1.0},

"facet":{

  "time":0.0},

"facet_module":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":0.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"terms":{

  "time":0.0},

"debug":{

  "time":0.0}},

  "process":{

"time":36.0,

"query":{

  "time":1.0},

"facet":{

  "time":0.0},

"facet_module":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":31.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"terms":{

  "time":0.0},

"debug":{

  "time":3.0}},

  "loadFieldValues":{

"time":13.0



Best Regards,
Edwin

On Thu, 24 Jan 2019 at 20:57, Jan Høydahl  wrote:

> It would be useful if you can disclose the machine configuration, OS,
> memory, settings etc, as well as solr config including solr.in <
> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole picture
> of memory, GC, etc.
> You could also specify debugQuery=true on a slow search and check the
> timings section for clues. What QTime are you seeing on the slow queries in
> solr.log?
> If that does not reveal the reason, I'd connect to your solr instance with
> a tool like jVisualVM or similar, to inspect what takes time. Or better,
> hook up to DataDog, SPM or some other cloud 

Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Jan Høydahl
It would be useful if you can disclose the machine configuration, OS, memory, 
settings etc, as well as solr config including solr.in .sh, 
solrconfig.xml etc, so we can see the whole picture of memory, GC, etc.
You could also specify debugQuery=true on a slow search and check the timings 
section for clues. What QTime are you seeing on the slow queries in solr.log? 
If that does not reveal the reason, I'd connect to your solr instance with a 
tool like jVisualVM or similar, to inspect what takes time. Or better, hook up 
to DataDog, SPM or some other cloud tool to get a full view of the system.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo :
> 
> Hi Shawn,
> 
> Unfortunately your reply of memory may not be valid. Please refer to my
> explanation below of the strange behaviors (is it much more like a BUG than
> anything else that is explainable):
> 
> Note that we still have 18GB of free unused memory on the server.
> 
> 1. We indexed the first collection called customers (3.7 millioin records
> from CSV data), index size is 2.09GB. The search in customers for any
> keyword is returned within 50ms (QTime) for using highlight (unified
> highlighter, posting, light term vectors)
> 
> 2. Then we indexed the second collection called policies (6 million records
> from CSV data), index size is 2.55GB. The search in policies for any
> keyword is returned within 50ms (QTime) for using highlight (unified
> highlighter, posting, light term vectors)
> 
> 3. But now any search in customers for any keywords (not from cache) takes
> as high as 1200ms (QTime). But still policies search remains very fast
> (50ms).
> 
> 4. So we decided to run the force optimize command on customers collection (
> https://localhost:8983/edm/customers/update?optimize=true=1=false),
> surprisingly after optimization the search on customers collection for any
> keywords become very fast again (less than 50ms). BUT strangely, the search
> in policies collection become very slow (around 1200ms) without any changes
> to the policies collection.
> 
> 5. Based on above result, we decided to run the force optimize command on
> policies collection (
> https://localhost:8983/edm/policies/update?optimize=true=1=false).
> More surprisingly, after optimization the search on policies collection for
> any keywords become very fast again (less than 50ms). BUT more strangely,
> the search in customers collection again become very slow (around 1200ms)
> without any changes to the customers collection.
> 
> What a strange and unexpected behavior! If this is not a bug, how could you
> explain the above very strange behavior in Solr 7.5. Could it be a bug?
> 
> We would appreciate any support or help on our above situation.
> 
> Thank you.
> 
> Regards,
> Edwin
> 
> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo 
> wrote:
> 
>> Hi Shawn,
>> 
>>> If the two collections have data on the same server(s), I can see this
>>> happening.  More memory is consumed when there is additional data, and
>>> when Solr needs more memory, performance might be affected.  The
>>> solution is generally to install more memory in the server.
>> 
>> I have found that even after we delete the index in collection2, the query
>> QTime for collection1 still remains slow. It does not goes back to its
>> previous fast speed before we index collection2.
>> 
>> Regards,
>> Edwin
>> 
>> 
>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo 
>> wrote:
>> 
>>> Hi Shawn,
>>> 
>>> Thanks for your reply.
>>> 
>>> The log only shows a list  the following and I don't see any other logs
>>> besides these.
>>> 
>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>>> id=13245417
>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>>> id=13245430
>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>>> id=13245435
>>> 
>>> There is no change to the segments info. but the slowdown in the first
>>> collection is very drastic.
>>> Before the indexing of collection2, the collection1 query QTime are in
>>> the range of 4 to 50 ms. However, after indexing collection2, the
>>> collection1 query QTime increases to more than 1000 ms. The index are done
>>> in CSV format, and the size of the index is 3GB.
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> 
>>> 
>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey  wrote:
>>> 
 On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
> I am using Solr 7.5.0, and currently I am facing an issue of when I am
> indexing in 

Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Zheng Lin Edwin Yeo
Hi Shawn,

Unfortunately your reply of memory may not be valid. Please refer to my
explanation below of the strange behaviors (is it much more like a BUG than
anything else that is explainable):

Note that we still have 18GB of free unused memory on the server.

1. We indexed the first collection called customers (3.7 millioin records
from CSV data), index size is 2.09GB. The search in customers for any
keyword is returned within 50ms (QTime) for using highlight (unified
highlighter, posting, light term vectors)

2. Then we indexed the second collection called policies (6 million records
from CSV data), index size is 2.55GB. The search in policies for any
keyword is returned within 50ms (QTime) for using highlight (unified
highlighter, posting, light term vectors)

3. But now any search in customers for any keywords (not from cache) takes
as high as 1200ms (QTime). But still policies search remains very fast
(50ms).

4. So we decided to run the force optimize command on customers collection (
https://localhost:8983/edm/customers/update?optimize=true=1=false),
surprisingly after optimization the search on customers collection for any
keywords become very fast again (less than 50ms). BUT strangely, the search
in policies collection become very slow (around 1200ms) without any changes
to the policies collection.

5. Based on above result, we decided to run the force optimize command on
policies collection (
https://localhost:8983/edm/policies/update?optimize=true=1=false).
More surprisingly, after optimization the search on policies collection for
any keywords become very fast again (less than 50ms). BUT more strangely,
the search in customers collection again become very slow (around 1200ms)
without any changes to the customers collection.

What a strange and unexpected behavior! If this is not a bug, how could you
explain the above very strange behavior in Solr 7.5. Could it be a bug?

We would appreciate any support or help on our above situation.

Thank you.

Regards,
Edwin

On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo 
wrote:

> Hi Shawn,
>
> > If the two collections have data on the same server(s), I can see this
> > happening.  More memory is consumed when there is additional data, and
> > when Solr needs more memory, performance might be affected.  The
> > solution is generally to install more memory in the server.
>
> I have found that even after we delete the index in collection2, the query
> QTime for collection1 still remains slow. It does not goes back to its
> previous fast speed before we index collection2.
>
> Regards,
> Edwin
>
>
> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi Shawn,
>>
>> Thanks for your reply.
>>
>> The log only shows a list  the following and I don't see any other logs
>> besides these.
>>
>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>> id=13245417
>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>> id=13245430
>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>> id=13245435
>>
>> There is no change to the segments info. but the slowdown in the first
>> collection is very drastic.
>> Before the indexing of collection2, the collection1 query QTime are in
>> the range of 4 to 50 ms. However, after indexing collection2, the
>> collection1 query QTime increases to more than 1000 ms. The index are done
>> in CSV format, and the size of the index is 3GB.
>>
>> Regards,
>> Edwin
>>
>>
>>
>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey  wrote:
>>
>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>>> > I am using Solr 7.5.0, and currently I am facing an issue of when I am
>>> > indexing in collection2, the indexing affects the records in
>>> collection1.
>>> > Although the records are still intact, it seems that the settings of
>>> the
>>> > termVecotrs get wipe out, and the index size of collection1 reduced
>>> from
>>> > 3.3GB to 2.1GB after I do the indexing in collection2.
>>>
>>> This should not be possible.  Indexing in one collection should have
>>> absolutely no effect on another collection.
>>>
>>> If logging has been left at its default settings, the solr.log file
>>> should have enough info to show what actually happened.
>>>
>>> > Also, the search in
>>> > collection1, which was originall very fast, becomes very slow after the
>>> > indexing is done is collection2.
>>>
>>> If the two collections have data on the same server(s), I can see this
>>> happening.  More memory is consumed when there is additional data, and
>>> when Solr needs more memory, performance might be affected. 

Re: Solr dependencies with security issues (CVEs)

2019-01-24 Thread Jan Høydahl
Please see 
https://wiki.apache.org/solr/SolrSecurity#Solr_and_Vulnerability_Scanning_Tools 

 for a list of CVEs that do NOT affect Solr.

As that page states, if you believe that one of the CVEs are really exploitable 
in Solr, then please attempt to describe why you believe Solr is vulnerable, 
and send a report to secur...@apache.org  and/or 
file a private JIRA issue. Do not explain a new vulnerability on open mailing 
lists.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 24. jan. 2019 kl. 13:10 skrev Andreas Hubold :
> 
> Hi,
> 
> in our project, we're checking JAR dependencies with the OWASP dependency 
> check [1] for security issues for which CVEs have been reported.
> 
> There are CVEs for some of Solr's third-party dependencies in version 7.6.0, 
> and I wonder if you have plans to update these to unaffected versions. I 
> don't know if these CVEs affect Solr, but event if they don't, IMHO it would 
> be good to update them so that users don't need to analyze the reports in 
> detail.
> 
> This is what I found for solr-core Maven dependencies:
> 
> * protobuf-java-3.1.0.jar https://nvd.nist.gov/vuln/detail/CVE-2015-5237 
> (fixed since protobuf 3.4)
> * dom4j-1.6.1.jar https://nvd.nist.gov/vuln/detail/CVE-2018-1000632 (fixed in 
> dom4j 2.1.1)
> * hadoop-hdfs-2.7.4.jar https://nvd.nist.gov/vuln/detail/CVE-2017-15718 
> (fixed in hadoop 2.7.5)
> 
> What do you think?
> 
> Thanks,
> Andreas
> 
> [1] https://www.owasp.org/index.php/OWASP_Dependency_Check
> 



Solr dependencies with security issues (CVEs)

2019-01-24 Thread Andreas Hubold

Hi,

in our project, we're checking JAR dependencies with the OWASP 
dependency check [1] for security issues for which CVEs have been reported.


There are CVEs for some of Solr's third-party dependencies in version 
7.6.0, and I wonder if you have plans to update these to unaffected 
versions. I don't know if these CVEs affect Solr, but event if they 
don't, IMHO it would be good to update them so that users don't need to 
analyze the reports in detail.


This is what I found for solr-core Maven dependencies:

* protobuf-java-3.1.0.jar https://nvd.nist.gov/vuln/detail/CVE-2015-5237 
(fixed since protobuf 3.4)
* dom4j-1.6.1.jar https://nvd.nist.gov/vuln/detail/CVE-2018-1000632 
(fixed in dom4j 2.1.1)
* hadoop-hdfs-2.7.4.jar https://nvd.nist.gov/vuln/detail/CVE-2017-15718 
(fixed in hadoop 2.7.5)


What do you think?

Thanks,
Andreas

[1] https://www.owasp.org/index.php/OWASP_Dependency_Check



CDCR "all" collections

2019-01-24 Thread Bram Van Dam
Hey folks,

Is there any way to set up CDCR for *all* collections, including any
newly created ones? Having to modify the solrconfig in ZK every time a
collection is added is a bit of a pain, especially because I'm assuming
it requires a restart to activate the config?

Basically if I have DC Src and DC Tgt, I want every collection from Src
to be replicated to Tgt. Even when I create a new collection on Src.

Thanks,

 - Bram


How to estimate Java Heap Requirement for solr.

2019-01-24 Thread Satya Nand kanodia
Hi,

I have a Solr instance having 6 cores. I have given -Xms1024m -Xmx16g heap
memory to it.


*Cores have the following number of documents in it.*
1. 86,31,043
2. 6,59,61,263
3. 4,55,31,492
4. 21,10,087
5. 1,14,477
6. 33,397


*I have following cache configuration.*


  

 


My question is if the heap size given by me is okay? If not what should be
the required heap size?


Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Zheng Lin Edwin Yeo
Hi Shawn,

> If the two collections have data on the same server(s), I can see this
> happening.  More memory is consumed when there is additional data, and
> when Solr needs more memory, performance might be affected.  The
> solution is generally to install more memory in the server.

I have found that even after we delete the index in collection2, the query
QTime for collection1 still remains slow. It does not goes back to its
previous fast speed before we index collection2.

Regards,
Edwin


On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo 
wrote:

> Hi Shawn,
>
> Thanks for your reply.
>
> The log only shows a list  the following and I don't see any other logs
> besides these.
>
> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
> id=13245417
> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
> id=13245430
> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
> id=13245435
>
> There is no change to the segments info. but the slowdown in the first
> collection is very drastic.
> Before the indexing of collection2, the collection1 query QTime are in the
> range of 4 to 50 ms. However, after indexing collection2, the collection1
> query QTime increases to more than 1000 ms. The index are done in CSV
> format, and the size of the index is 3GB.
>
> Regards,
> Edwin
>
>
>
> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey  wrote:
>
>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>> > I am using Solr 7.5.0, and currently I am facing an issue of when I am
>> > indexing in collection2, the indexing affects the records in
>> collection1.
>> > Although the records are still intact, it seems that the settings of the
>> > termVecotrs get wipe out, and the index size of collection1 reduced from
>> > 3.3GB to 2.1GB after I do the indexing in collection2.
>>
>> This should not be possible.  Indexing in one collection should have
>> absolutely no effect on another collection.
>>
>> If logging has been left at its default settings, the solr.log file
>> should have enough info to show what actually happened.
>>
>> > Also, the search in
>> > collection1, which was originall very fast, becomes very slow after the
>> > indexing is done is collection2.
>>
>> If the two collections have data on the same server(s), I can see this
>> happening.  More memory is consumed when there is additional data, and
>> when Solr needs more memory, performance might be affected.  The
>> solution is generally to install more memory in the server.  If the
>> system is working, there should be no need to increase the heap size
>> when the memory size increases ... but there can be situations where the
>> heap is a little bit too small, where you WOULD want to increase the
>> heap size.
>>
>> Thanks,
>> Shawn
>>
>>