RE: Solr Index Size after reindex

2019-02-13 Thread Mathieu Menard
Hello Andrea,

I'm really sorry for the delay of my answer but I beed more information before 
answer you.

Yes 5.365.213 is the numDocs you got just after the sync and yes 4.537.651 is 
the numDocs you got in the staging server after the reindexing and the 
colleague who realized the rsync confirm that it has been entirely completed.

I don't see any transaction not completed that normaly means that the 
indexation is completed. That's why I don't understand the difference.

Kind Regards

Matthieu

Original Message-
From: Andrea Gazzarini [mailto:a.gazzar...@sease.io] 
Sent: samedi 9 février 2019 16:56
To: solr-user@lucene.apache.org
Subject: Re: Solr Index Size after reindex

Yes, those numbers are different and that should explain the different size. I 
think you should be able to find some information in the Alfresco or Solr log. 
There must be a reason about the missing content. 
For example, are those numbers coming from two comparable snapshots? In other 
words, I imagine that at a given moment X you rsync-ed the two servers

  * 5.365.213 is the numDocs you got just after the sync, isn't it?
  * 4.537.651 is the numDocs you got in the staging server after the
reindexing isn't it? Are you sure the whole reindexing is completed?

MaxDocs is the number of documents you have in the index including the deleted 
docs not yet cleared by a merge. In the console you should also see the 
"Deleted docs" count which should be equal to (maxdocs - numdocs)

Ciao

Andrea

On 08/02/2019 15:53, Mathieu Menard wrote:
>
> Hi Andrea,
>
> I've checked this information and here is the result:
>
>   
>
> PRODUCTION
>
>   
>
> STAGING
>
> *numDocs*
>
>   
>
> 5.365.213
>
>   
>
> 4.537.651
>
> *MaxDoc*
>
>   
>
> 5.845.469
>
>   
>
> 5.129.556
>
> It seems that there is more than 800.00 docs in PRODUCTION that will 
> explain the size of indexes more important. But there is a thing that 
> I don't understand, we have copied the DB and the contenstore the 
> numDocs for the two environments should be the same no?
>
> Could you also explain me the meaning of the maxDocs value pleases?
>
> Thanks
>
> Matthieu
>
> *From:*Andrea Gazzarini [mailto:a.gazzar...@sease.io]
> *Sent:* vendredi 8 février 2019 14:54
> *To:* solr-user@lucene.apache.org
> *Subject:* Re: Solr Index Size after reindex
>
> Hi Mathieu,
> what about the docs in the two infrastructures? Do they have the same 
> numbers (numdocs / maxdocs)? Any meaningful message (error or not) in 
> log files?
>
> Andrea
>
> On 08/02/2019 14:19, Mathieu Menard wrote:
>
> Hello,
>
> I would like to have your point of view about an observation we
> have made on our two alfresco install (Production and Staging
> environment) and more specifically on the size of our solr indexes
> on these two environments.
>
> Regularly we do a rsync between the Production and the Staging
> environment, we make a copy of the Alfresco's DB and a copy of the
> entire contenstore after that we reindex all the alfresco content.
>
> We have noticed that for the production environment we have 19 Gb
> of indexes while in the staging we have "only" 11. Gb of indexes.
> We have some difficulties to understand this difference because we
> assume that the indexes optimization in the same for a full
> reindex or for the normal use of solr.
>
> I've verified the configuration between the two solr instances and
> I don't see any differences could you help me to better understand
>  this phenomenon.
>
> Here you can find some information about our two environment, if
> you need more details, I will give you as soon as possible:
>
>   
>
> PRODUCTION
>
>   
>
> STAGING
>
> Alfresco version
>
>   
>
> 5.1.1.4
>
>   
>
> 5.1.1.4
>
> Solr Version
>
>   
>
>   
>
> Java version
>
>   
>
>   
>
> Linux Machine
>
>   
>
> See Staging_caracteristics.txt file in attachment
>
>   
>
> See Staging_caracteristics.txt file in attachment
>
> Please let me know if you any other information I will sent it to
> you rapidly.
>
> Kind Regards
>
> Matthieu
>


Re: Migrate from sol 5.3.1 to 7.5.0

2019-02-13 Thread ramyogi
Thanks Erick, I was waiting your Day time appears to get suggestion because
you are very spontaneous for this great open source community.

1. Did you recompile and redistriburte your custom component? 
* Yes*
2. Did you take the solrconfig that came with 7.5 and modify it rather 
than copy your 5.3 solrconfig? 
No, We have adjusted existing 5.3.1 config when we used for couple years to
SOLR 7.5.0.
3. Did you reindex from scratch with 7x? Lucene/Solr guarantees only 
one major version back-compat. 
Yes, We have done Full Reindexing. No issues noticed with reindex.

All works as expected in the flow of /select request handler. Noticed one
stuff below changed in SOLR 7.5.0
All Shard Requests are distributed back to custom component flow instead
default components.

// for distributed queries that don’t include shards.qt, use the original
path
   // as the default but operators need to update their
luceneMatchVersion to enable
   // this behavior since it did not work this way prior to 5.1
   if
(req.getCore().getSolrConfig().luceneMatchVersion.onOrAfter(Version.LUCENE_5_1_0))
{
 String reqPath = (String) req.getContext().get(PATH);
 if (!“/select”.equals(reqPath)) {
   params.set(CommonParams.QT, reqPath);
 } // else if path is /select, then the qt gets passed thru
if set
   } else {
 // this is the pre-5.1 behavior, which translates to
sending the shard request to /select
 params.remove(CommonParams.QT);
   }

Below some portion of debug log.
Enabled Debug and noticed that for the custom component handler flow for all
shard request,
"PARSE_QUERY":{
"http://<>_shard26_replica_n9/":{
  "QTime":"27",
  "ElapsedTime":"28",
  "RequestPurpose":"GET_TERM_STATS",
  "NumFound":"<>",
 
"Response":"{responseHeader={zkConnected=true,xqs=best,info={},status=0,QTime=27},response={numFound=

"EXECUTE_QUERY":{
"http://<>_shard3_replica_n27/":{
  "QTime":"26",
  "ElapsedTime":"26",
 
"RequestPurpose":"GET_TOP_IDS,GET_FIELDS,GET_DEBUG,SET_TERM_STATS",
  "NumFound":"<>",
 
"Response":"{responseHeader={zkConnected=true,xqs=best,info={},status=0,QTime=26},response={numFound=

*But when Default(/select) flow goes,*

"PARSE_QUERY":{
"http://<>_shard24_replica_n6/":{
  "QTime":"0",
  "ElapsedTime":"1",
  "RequestPurpose":"GET_TERM_STATS",
 
"Response":"{responseHeader={zkConnected=true,status=0,QTime=0,params={df=all,distrib=false,debug=[false,
timing, track],shards.purpose=3276
 }
 "EXECUTE_QUERY":{
"http://<>_shard30_replica_n1/":{
  "QTime":"17",
  "ElapsedTime":"19",
  "RequestPurpose":"GET_TOP_IDS,SET_TERM_STATS",
  "NumFound":"316605",
 
"Response":"{responseHeader={zkConnected=true,status=0,QTime=17,params={df=all,distrib=false,fl=[id,
score],shards.purpose=16388



is there any documentation about to read all these flow purpose and
understand better



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Migrate from sol 5.3.1 to 7.5.0

2019-02-13 Thread ramyogi
Thanks Jan, I am unaware how our Devops team decided this But this was
working well without any issues with SOLR 5.3.1 for couple of years. Just
wanted to make any changes in SOLR7 mandates .



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Multiplicative Boosts broken since 7.3 (LUCENE-8099)

2019-02-13 Thread Burgmans, Tom
I like to bump this issue up, since this is a showstopper for us to upgrade 
from Solr 6. In https://issues.apache.org/jira/browse/SOLR-13126 I described a 
couple of more use cases in which this bug appears. We see different scores in 
the EXPLAIN compared to the actual scores and our analysis is that the EXPLAIN 
in fact is correct. It happens when a multiplicative boost is used (via the 
"boost" parameter) in combination with some function queries, like "query" and 
"field". 

One example (tested on Solr 7.5.0), when running: 

http://localhost:8983/solr/test/select?defType=edismax=id,score,[explain 
style=text]=*:*=sum(field(price),4)

then the expectation is that a document that doesn't have the price field gets 
a score of 4. The result however is: 

{
"id": "docid123576",
"score": 1.0,
"[explain]": "4.0 = product of:\n  1.0 = boost\n  4.0 = product of:\n
1.0 = *:*\n4.0 = sum(float(price)=0.0,const(4))\n"
}

EXPLAIN and score are not consistent.

Best regards Tom


-Original Message-
From: Tobias Ibounig [mailto:t.ibou...@netconomy.net] 
Sent: dinsdag 22 januari 2019 10:14
To: solr-user@lucene.apache.org
Subject: Multiplicative Boosts broken since 7.3 (LUCENE-8099)

Hello,

As described in 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSOLR-13126data=02%7C01%7Ctom.burgmans%40wolterskluwer.com%7C82b7f7923bd74285295e08d68049f3da%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C636837452448856240sdata=paFEStnQwxcKQQ9mM1MfPXQm%2BrStTaqQnYFH2LolVl8%3Dreserved=0
 multiplicative boots (in certain conditions) seem to be broken since 7.3.
The error seems to be introduced in 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FLUCENE-8099data=02%7C01%7Ctom.burgmans%40wolterskluwer.com%7C82b7f7923bd74285295e08d68049f3da%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C636837452448856240sdata=Gs1EzQ%2FCSO8ryZJv0EGx2etxmDA7HkW8Crj5H6mE%2FvE%3Dreserved=0.
 Reverting the SOLR parts to the now deprecated BoostingQuery again fixes the 
issue.
The filed issue contains a test case and a patch with the revert (for testing 
purposes, not really a clean fix).
We sadly couldn't find the actual issue, which seems to lie with the use of 
"FunctionScoreQuery" for boosting.

We were able to patch our 7.5 installation with the patch. As others might be 
affected as well, we hope this can be helpful in resolving this bug.

To all SOLR/Lucene developers, thank you for your work. Looking trough the code 
base gave me a new appreciation of your work.

Best Regards,
Tobias

PS: This issue was already posted by a colleague, "Inconsistent debugQuery 
score with multiplicative boost", but I wanted to create a new post with a 
clearer title.



Re: Incorrect shard placement during Collection creation in 7.6

2019-02-13 Thread Erick Erickson
I haven't verified, but this looks like a JIRA to me. Looks like some
of the create logic may have issues, see: SOLR-12944 and maybe link to
that JIRA?

Best,
Erick

On Wed, Feb 13, 2019 at 4:15 AM Bram Van Dam  wrote:
>
> > TL;DR; createNodeSet & shards combination is not being respected.
>
> Update: Upgraded to 7.7, no improvement sadly.


Re: misteriuos nullpointerexception while adding documents

2019-02-13 Thread Erick Erickson
bq. I also tried with a plain solr installation (just unpack solr and copy
the index folder), and in this way it works.

Then it sounds like your production system was not installed properly
if it mysteriously fails there but succeeds on a new install.

If you upgraded your prod system, did you use the configs that come
with the distro you're using or copy the old configs from a previous
version? The former is preferred.

Best,
Erick

On Wed, Feb 13, 2019 at 2:58 AM Danilo Tomasoni  wrote:
>
> I changed the schema, but I deleted all the documents and tried a reindex.
>
> I also tried deleting the core and re-adding it.
>
> The autocommit is disabled because hard commits are controlled in the
> client-side.
>
> I also tried with a plain solr installation (just unpack solr and copy
> the index folder), and in this way it works.
>
> In my production environment, If I create a new core with exactly the
> same configuration it doesn't ( I mean, after a couple of POST the error
> I sent you appears in the logs and the core is stopped and reloaded).
>
> Any clue?
>
> Thank you for your suggestions
>
> Danilo
>
> On 12/02/19 18:09, Erick Erickson wrote:
> > bq. I disabled autocommit (both soft and hard), but used to work with a
> > previous version of the schema.
> >
> >
> > First, did you _change_ the schema without
> > 1> deleting all the docs in the index
> > 2> reindexing everything
> >
> > or better, indexing to a new collection and aliasing to it?
> >
> > If you changed the schema and continued indexing to the
> > collection _at any time in the past_ you may have
> > inconsistent merged segments. The only way to fix this
> > is to re-index, and I strongly recommend to a new
> > collection.
> >
> > Second:
> > do _not_ disable, commits and index for a long time.
> > 1> your tlog will grow forever until you do a hard commit
> > 2> certain internal Solr structures grow until a new searcher
> >  is opened, either soft commit or hard-commit-with-opensearcher-true
> >
> > https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >
> > Best,
> > Erick
> >
> > On Tue, Feb 12, 2019 at 1:03 AM MUNENDRA S.N  
> > wrote:
> >> Are you trying to set some field to null in the request?? Also, is that
> >> particular field numeric, doc valued enabled and stored set to false??
> >> Sharing more details would help here, specifically update request and
> >> schema for those fields.
> >>
> >> Regards,
> >> Munendra S N
> >>
> >>
> >> On Tue, Feb 12, 2019 at 2:24 PM Danilo Tomasoni  wrote:
> >>
> >>> Hello all,
> >>>
> >>> I get this error while uploading my documents with 'set' modifier in
> >>> json format.
> >>>
> >>> My solr version is 7.3.1.
> >>>
> >>> I disabled autocommit (both soft and hard), but used to work with a
> >>> previous version of the schema.
> >>>
> >>> Someone have any clue on what's going on here?
> >>>
> >>> I can't reproduce the issue indexing locally on another solr 7.3.1
> >>> instance, I get this error only in the production instance.
> >>>
> >>> Thank you
> >>>
> >>> Danilo
> >>>
> >>>
> >>>  This is the error I get from client 
> >>>
> >>>
> >>> {
> >>> "responseHeader":{
> >>>   "status":500,
> >>>   "QTime":33},
> >>> "error":{
> >>>   "trace":"
> >>>
> >>> java.lang.NullPointerException
> >>>   at org.apache.solr.update.UpdateLog.lookup(UpdateLog.java:971)
> >>>   at
> >>>
> >>> org.apache.solr.handler.component.RealTimeGetComponent.getInputDocumentFromTlog(RealTimeGetComponent.java:537)
> >>>   at
> >>>
> >>> org.apache.solr.handler.component.RealTimeGetComponent.getInputDocument(RealTimeGetComponent.java:617)
> >>>   at
> >>>
> >>> org.apache.solr.handler.component.RealTimeGetComponent.getInputDocument(RealTimeGetComponent.java:594)
> >>>   at
> >>>
> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:1330)
> >>>   at
> >>>
> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1049)
> >>>   at
> >>>
> >>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:633)
> >>>   at
> >>>
> >>> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
> >>>   at
> >>>
> >>> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
> >>>   at
> >>>
> >>> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
> >>>   at
> >>>
> >>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:501)
> >>>   at
> >>>
> >>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:145)
> >>>   at
> >>>
> >>> 

Re: SOLR 7.5.0 (Migrate from 5.3.1 to 7.5.0)

2019-02-13 Thread Shawn Heisey

On 2/12/2019 9:25 PM, ramyogi wrote:

[test_shard20_replica_n38] PERFORMANCE WARNING: Overlapping
onDeckSearchers=6
2/12/2019, 1:45:39 PM
WARN true
x:test_shard20_replica_n38
DirectUpdateHandler2
Starting optimize... Reading and rewriting the entire index! Use with care.

Eventhough index is optimized and segment xount is 1. Keep triggering
"DirectUpdateHandler2
Starting optimize... Reading and rewriting the entire index! Use with care."


Solr will *NEVER* do an optimize on its own.  It is always triggered by 
outside action.  Something is telling Solr to do an optimize.



What these warning signifies and how to resolve those ?
we configured ZK host (3 hosts) created CNAME and use that to SOLR
configuration zkHost=http://<>/


The "Overlapping onDeckSearchers" message is caused by doing commits 
with openSearcher set to true (true is the default) too frequently.  One 
commit is not yet complete when another one is triggered.  Based on how 
many overlapping searchers the log says there are, it seems that there 
are a lot of commits happening in quick succession.


These messages are not a direct result of upgrading.  Version 5.3.1 can 
log them too.


Thanks,
Shawn


Re: Migrate from sol 5.3.1 to 7.5.0

2019-02-13 Thread Erick Erickson
1. Did you recompile and redistriburte your custom component?
2. Did you take the solrconfig that came with 7.5 and modify it rather
than copy your 5.3 solrconfig?
3. Did you reindex from scratch with 7x? Lucene/Solr guarantees only
one major version back-compat.

Best,
Erick

On Wed, Feb 13, 2019 at 7:26 AM Jan Høydahl  wrote:
>
> You need to list all zk hosts in zkHost, that’s part of the design for ZK, 
> clients will open connections to all nodes and be able to fail over if 
> trouble.
>
> Where did you get the CNAME advice?
>
> Jan Høydahl
>
> > 13. feb. 2019 kl. 05:56 skrev ramyogi :
> >
> > We are migrating SOLR version, We used 3 ZK hosts that configured to SOLR as
> > ZK connection string: zookeeper.solrtest.net:2181/test-config
> > Ensemble size: 1
> > Ensemble mode: ensemble
> > zookeeper.solrtest.net:2181
> > oktrue
> > clientPort2181
> > zk_server_statefollower
> > zk_version3.4.5
> > zk_approximate_data_size9902464
> > zk_znode_count1734
> > zk_num_alive_connections43
> > serverId4
> > electionPort3888
> > quorumPort2888
> >
> > zookeeper.solrtest.net . is nothing but CNAME of all three ZKHost together.
> >
> > In solr admin console we see
> > Errors:
> > We do not have a leader
> >
> >
> > And we are using custom component before query goes to "query" component in
> > proper order execution but pagination is not working after some limit for
> > example :start:599 results comes but start:600 docs[] empty in the response.
> > Results counts are reduced for some queries fields(combined) if we use our
> > custom component order  but works if we use /select.
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


NYC Apache Lucene/Solr User Group: Call for Speakers

2019-02-13 Thread Carlos Valcarcel
Hello, everyone!

The New York Apache Lucene/Solr User Group
 is, for the first
time, sending out a call for speakers for 2019/20.

We have a great line up of speakers for this year, but we also want to hear
more from the community. Now that we are over 1300 members, there is
nothing more I would like than to have some of our members step forward and
give a talk about a new technology they use along with Solr.

Hence the Call for Speakers!

Send me your abstracts, your titles, your bio. Send me links to where
you've presented (optional)! Send me anything you think relevant to search
(get it?)!

We are starting again and I’d like to have as much of the year laid out as
possible even spilling over into 2020 (can you believe it? 2020 already!).

Don't be formal! Send you ideas!

If you’d rather not present, are there any talks you'd like to hear?

Should we have a Solr training half-day to acquaint family and friends with
Solr?

Should we give away Kindles instead of physical books? (Actually we already
are!)

Pizza instead of sandwiches?

Paper or plastic?

Regular or decaf?

Your thoughts matter!

Let me know!

Carlos Valcarcel

NYC Apache Lucene/Solr User Group

Our 2019 Schedule



Re: Get details about server-side errors

2019-02-13 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Jason,

On 2/13/19 07:39, Jason Gerlowski wrote:
> Hey Chris,
> 
> Unfortunately I think you covered the main/only options above.
> 
> HTTP status code isn't the most useful, but it's worth pointing
> out that there are a few things you can do with it.  Some status
> codes are easy to identify and come up with a good message to
> display to your end user e.g. 403 codes.  But of course it doesn't
> do anything to help you disambiguate 400 error messages you get.
> 
> Error handling has always been one of SolrJ's weak spots.  One
> thing people have suggested before is adding some sort of enum to
> error responses that is less ambiguous and easier to interpret 
> programmatically, but it's never been picked up.  A bit more 
> information on SOLR-7170.  Feel free to vote for it or chime in
> there if you think that'd be an improvement.

I've added some comments and a proposed fix that meets *my* needs, but
I want to make sure that it will be useful for others (and not just my
specific use-case).

Thanks,
- -chris

> On Tue, Feb 12, 2019 at 5:09 PM Christopher Schultz 
>  wrote:
>> 
> Hello, everyone.
> 
> I'm trying to get some information about a (fairly) simple case
> when a user is searching using a wide-open query where they can
> type in anything they want, including field-names. Of course, it's
> possible that they will try to enter a field-name that does not
> exist and Solr will complain, like this:
> 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>
> 
Error from server at http://localhost:8983/solr/users: undefined field
> bad_field
> 
> (This is what happens when I search my user database for
> "bad_field:foo" .)
> 
> What is the best way to discover what happened on the server --
> from a code perspective. I can certainly read the above as a human
> and see what the problem is. But my users won't understand
> (exactly) what that means and I don't always have English-language
> searching my user databas e.
> 
> Is there a way to check for "was the error a bad field name?" and 
> "what was the bad field name (or names) detected?"
> 
> I looked at javadoc and saw two hopefuls:
> 
> 1.   code -- unfortunately, this is the HTTP response code
> 
> 2.  metadata -- unfortunately, this just returns 
> {error-class=org.apache.solr.common.SolrException,root-error-class=org
.a
>
> 
pache.solr.common.SolrException},
> which is already obvious from the exception type.
> 
> Is there something in SolrJ that I'm overlooking, here, or am I 
> limited to what I can parse out of the exception's "getMessage"
> string?
> 
> Thanks, -chris
> 
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlxkPP8ACgkQHPApP6U8
pFiWkg/+Jd9kHBUc0dYPw9EqkiqjzKDc+/adtERK1TktD/GxYoJaXoKwkNeSt+5C
nOysBDwPoPBZELKmCQVDyyyIKrGfSYw2Pva0fDuMr1fKNoazf6I68/5BusjNf5iL
ETkPuCtGuV6fmETGnK9xLFKE41tTO2u32erWnCcxbBPC858qNhafYfO1UZ3lzjuj
kvuV81RESL4LQvbfx98FKxhgiHJGCV9maY4xFGQeNpI0nc3btnneAGfqUIBxJdhk
RT97PdMF1yZ37aLx4H4wUTtey8hAvJhHSpDg1fw+UDNoGXcefpTwh+KQMqK5D3Cg
QRLzdbzu2BR14saV2tkJ+lKbt0zvurYgOJ2J2CaCz2o44n0P82ll3hCnUCV8WfYW
G70iKi8+8y73jMCOYf5hPO3O5uUJXg3dpGjgaRHHzkoOks2A+3QEWlX0CWEyoO4U
Zg2avKpZNgHj6I5TxyiHD4EkhU3/e3GHbB4neUyvU36zpC6+g54a3CM7HoxWBTUn
NtU2C7jDHJozUnn1S3IGOIdwv5CJ7rJNfgp+m/BOw9xuF1g/Rt7QG68J5KK0/JQE
IL68zAQzWX/1KubIT3Ro5AD/2tR8CKXsCv72U8CdpjSQFFnV+6rFAvS2M7e1D6dm
Lj3yRS4EcKQEgYUKltyWGX2GqnLENGLOUa2wd3aiJY7kiOdNgrA=
=75gr
-END PGP SIGNATURE-


Re: Migrate from sol 5.3.1 to 7.5.0

2019-02-13 Thread Jan Høydahl
You need to list all zk hosts in zkHost, that’s part of the design for ZK, 
clients will open connections to all nodes and be able to fail over if trouble.

Where did you get the CNAME advice?

Jan Høydahl

> 13. feb. 2019 kl. 05:56 skrev ramyogi :
> 
> We are migrating SOLR version, We used 3 ZK hosts that configured to SOLR as 
> ZK connection string: zookeeper.solrtest.net:2181/test-config
> Ensemble size: 1
> Ensemble mode: ensemble
> zookeeper.solrtest.net:2181
> oktrue
> clientPort2181
> zk_server_statefollower
> zk_version3.4.5
> zk_approximate_data_size9902464
> zk_znode_count1734
> zk_num_alive_connections43
> serverId4
> electionPort3888
> quorumPort2888 
> 
> zookeeper.solrtest.net . is nothing but CNAME of all three ZKHost together.
> 
> In solr admin console we see 
> Errors:
> We do not have a leader
> 
> 
> And we are using custom component before query goes to "query" component in
> proper order execution but pagination is not working after some limit for
> example :start:599 results comes but start:600 docs[] empty in the response. 
> Results counts are reduced for some queries fields(combined) if we use our
> custom component order  but works if we use /select.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: What's the deal with dataimporthandler overwriting indexes?

2019-02-13 Thread Joakim Hansson
Thank you all for helping me with this.
I have started implementing aliases and that seems like the proper way to
go.
Thanks again and all the best!



Den tis 12 feb. 2019 kl 18:16 skrev Elizabeth Haubert <
ehaub...@opensourceconnections.com>:

> I've run into this also; it is a key difference between a master-slave
> setup and a solrCloud setup.
>
> clean=true has always deleted the index on the first commit, but in older
> versions of Solr, the workaround was to disable replication until the full
> reindex had completed.
>
> This is a convenient practice for a number of reasons, especially for small
> indices.  It really isn't supported in SolrCloud, because of the difference
> in how writes are processed for Master/Slave vs. SolrCloud.  With a
> Master/Slave setup, all writes are going to the same location, so disabling
> replication lets you buffer them up all in one go.   With a SolrCloud
> setup,  the data is distributed across the nodes in the cluster.  So it
> would need to know to blow away at the 'master' node for each shard to
> support the 'clean', serve traffic from the slaves only for each shard,
> until the re-index completes, do the replications, and then resume normal
> operation.
>
> Note that in Solr 7.x if you revert to the master/slave setup, you need to
> disable polling at the slaves.  Disabling replication at the master will
> also cause index deletion at the slaves (SOLR-11938).
>
> Elizabeth
>
> On Tue, Feb 12, 2019 at 11:42 AM Vadim Ivanov <
> vadim.iva...@spb.ntk-intourist.ru> wrote:
>
> > Hi!
> > If clean=true then index will be replaced completely by the new import.
> > That is how it is supposed to work.
> > If you don't want preemptively delete your index set =false. And
> set
> > =true instead of =true
> > Are you sure about optimize? Do you really need it? Usually it's very
> > costly.
> > So, I'd try:
> > dataimport?command=full-import=false=true
> >
> > If nevertheless nothing imported, please check the log
> > --
> > Vadim
> >
> >
> >
> > > -Original Message-
> > > From: Joakim Hansson [mailto:joakim.hansso...@gmail.com]
> > > Sent: Tuesday, February 12, 2019 12:47 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: What's the deal with dataimporthandler overwriting indexes?
> > >
> > > Hi!
> > > We are currently upgrading from solr 6.2 master slave setup to solr 7.6
> > > running solrcloud.
> > > I dont know if I've missed something really trivial, but everytime I
> > start
> > > a full import (dataimport?command=full-import=true=true)
> > > the
> > > old index gets overwritten by the new import.
> > >
> > > In 6.2 this wasn't really a problem since I could disable replication
> in
> > > the API on the master and enable it once the import was completed.
> > > With 7.6 and solrcloud we use NRT-shards and replicas since those are
> the
> > > only ones that support rule-based replica placement and whenever I
> start
> > a
> > > new import the old index is overwritten all over the solrcloud cluster.
> > >
> > > I have tried changing to clean=false, but that makes the import finish
> > > without adding any docs.
> > > Doesn't matter if I use soft or hard commits.
> > >
> > > I don't get the logic in this. Why would you ever want to delete an
> > > existing index before there is a new one in place? What is it I'm
> missing
> > > here?
> > >
> > > Please enlighten me.
> >
> >
>


Re: Terms Query Parser: filtering on null and strings with whitespace.

2019-02-13 Thread Mikhail Khludnev
Oh yeah, my pet peeve. This is the cure.
(*:* AND -department_name:[* TO *]) OR {!tag=department_name terms
f=department_name v='Kirurgisk avdeling'}
no comments.

On Wed, Feb 13, 2019 at 1:49 PM Andreas Lønes  wrote:

> I am experiencing some weird behaviour when using terms query parser where
> I am filtering on documents that has no value for a given field(null) and
> strings with whitespaces.
>
> I can filter on documents not having a value OR having some specific
> values for the field as long as the value does not have a
> whitespace(example 1). I can also filter on specific values with
> whitespace)(example 2).
> What I am not able to make work is to filter on documents not having a
> value OR certain terms with whitespace(example 3)
>
> These work:
> 1. (*:* AND -department_shortname:[* TO *]) OR
> {!tag=department_shortname terms f=department_shortname}BARN,KIR
> 2. {!tag=department_name terms f=department_name}Kirurgisk avdeling
> Does not work:
> 3. (*:* AND -department_name:[* TO *]) OR {!tag=department_name terms
> f=department_name}Kirurgisk avdeling
>
> The configuration of the fields if that is of any value:
>  stored="true" required="false" multiValued="false" />
>  stored="true" required="false" multiValued="false" />
>
> So far, the only solution I can come up with is to index some kind of
> value that represents null, but as of my understanding it is recommended
> that null is not indexed.
>
>
> Thanks,
> Andreas
>


-- 
Sincerely yours
Mikhail Khludnev


Re: SOLR and AWS comprehend

2019-02-13 Thread Jörn Franke
I guess you have to find out which product fits better your needs. The use case 
can be somehow reflected in both solutions, but the description about the use 
case is very generic so difficult to say.

> Am 13.02.2019 um 13:17 schrieb Gareth Baxendale :
> 
> This is perhaps more or an architecture question than dev code but
> appreciate collective thoughts!
> 
> We are using Solr to order records and to categorise them to allow users to
> search and find specific medical conditions. We have an opportunity to make
> use of Machine Learning to aid and improve the results. AWS Comprehend is
> the product we are looking at but there is a question over whether one
> should replace the other as they would compete or if in fact both should
> work together to provide the solution we are after.
> 
> Appreciate any insights people have.
> 
> Thanks Gareth
> 
> -- 
> Confidential information may be contained in this message.If you are not 
> the intended recipient, any reading, printing, storage, disclosure, copying 
> or any other action taken in respect of this e-mail is prohibited and may 
> be unlawful. If you are not the intended recipient, please notify the 
> sender immediately by using the reply function and then permanently delete 
> what you have received.


Re: Get details about server-side errors

2019-02-13 Thread Jason Gerlowski
Hey Chris,

Unfortunately I think you covered the main/only options above.

HTTP status code isn't the most useful, but it's worth pointing out
that there are a few things you can do with it.  Some status codes are
easy to identify and come up with a good message to display to your
end user e.g. 403 codes.  But of course it doesn't do anything to help
you disambiguate 400 error messages you get.

Error handling has always been one of SolrJ's weak spots.  One thing
people have suggested before is adding some sort of enum to error
responses that is less ambiguous and easier to interpret
programmatically, but it's never been picked up.  A bit more
information on SOLR-7170.  Feel free to vote for it or chime in there
if you think that'd be an improvement.

But unfortunately there's nothing like this to help you out now, that
I know of at least.

Best,

Jason

On Tue, Feb 12, 2019 at 5:09 PM Christopher Schultz
 wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Hello, everyone.
>
> I'm trying to get some information about a (fairly) simple case when a
> user is searching using a wide-open query where they can type in
> anything they want, including field-names. Of course, it's possible
> that they will try to enter a field-name that does not exist and Solr
> will complain, like this:
>
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error from server at http://localhost:8983/solr/users: undefined field
> bad_field
>
> (This is what happens when I search my user database for "bad_field:foo"
> .)
>
> What is the best way to discover what happened on the server -- from a
> code perspective. I can certainly read the above as a human and see
> what the problem is. But my users won't understand (exactly) what that
> means and I don't always have English-language searching my user databas
> e.
>
> Is there a way to check for "was the error a bad field name?" and
> "what was the bad field name (or names) detected?"
>
> I looked at javadoc and saw two hopefuls:
>
> 1.   code -- unfortunately, this is the HTTP response code
>
> 2.  metadata -- unfortunately, this just returns
> {error-class=org.apache.solr.common.SolrException,root-error-class=org.a
> pache.solr.common.SolrException},
> which is already obvious from the exception type.
>
> Is there something in SolrJ that I'm overlooking, here, or am I
> limited to what I can parse out of the exception's "getMessage" string?
>
> Thanks,
> - -chris
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlxjRDIACgkQHPApP6U8
> pFihjBAAty32GuiOj8XnwJu55Y9tYWFoQOhNEEJEGmeh1mOv4fxj5D4Rh+7MXTJB
> 7APLZ5IlNjpGMQ5ygLpfFTrLIEljn/f/a8hRslH/g+H3p/y4EJgeyvbNHaQZdkuh
> HlKQ9Z/M6HK+1KGvVNB+9onU3hs7+Tct7TjWO/cZ031CPovDknsYTbOBoLW+tszS
> BrsR7up0s7AOWYNkXTu8i0tf6A6nkF8+YJvml2mxNvXUCZrhHh71eL3R+v1/zGun
> 6yYyGCPm5rO9Pkxq+It4Fo8pkvo3z6k65NAflMXsFcEwWaf/5OmzAjE+TrDdqfeQ
> InKDsXj3w6ZOHOEWN/lq8kK1alZUP0i8MQJHpAXzlPL213joP9mN2AeNk7airIXE
> hPPmUGKjOVlMDJg6ICJiPVibMjwLBiy68TQJj2DX+dMVeYTQSroPBw5VUJhrxinV
> +4y6podDJ6xs+27LxfI8DZ8nGAZP/tFYMCLNIdnhOg682PfaiD3ZiDDu5dJvm871
> 7N0EK3oCkoAmQ3l7xQNtz/0nDdI5TKSOtI3KBXTY72/8dfZlSoE4kwmBh56SrKQJ
> KNfT54Cj329p5qKoNBy1bKxw4GyUx0UbKQo8HyFqzK0gQHlH+23taq5IePhocW12
> uUMGSvVUnm/E+C5w3OGLJ96Y6a3aiNUORinkTJePz+sJoUbCIwY=
> =Ril5
> -END PGP SIGNATURE-


SOLR and AWS comprehend

2019-02-13 Thread Gareth Baxendale
This is perhaps more or an architecture question than dev code but
appreciate collective thoughts!

We are using Solr to order records and to categorise them to allow users to
search and find specific medical conditions. We have an opportunity to make
use of Machine Learning to aid and improve the results. AWS Comprehend is
the product we are looking at but there is a question over whether one
should replace the other as they would compete or if in fact both should
work together to provide the solution we are after.

Appreciate any insights people have.

Thanks Gareth

-- 
Confidential information may be contained in this message.If you are not 
the intended recipient, any reading, printing, storage, disclosure, copying 
or any other action taken in respect of this e-mail is prohibited and may 
be unlawful. If you are not the intended recipient, please notify the 
sender immediately by using the reply function and then permanently delete 
what you have received.


Re: Incorrect shard placement during Collection creation in 7.6

2019-02-13 Thread Bram Van Dam
> TL;DR; createNodeSet & shards combination is not being respected.

Update: Upgraded to 7.7, no improvement sadly.


RE: Document Score seen in debug section and in main results section dont match

2019-02-13 Thread Tobias Ibounig
Hi Baloo,

> Is there and solution/workaround available for this issue?
> or going back to Solr 7.2.1 is the only solution - As per comments in above
> issue (https://issues.apache.org/jira/browse/LUCENE-8099 these changes are
> not there in 7.2.1)

The only workaround, I know of, is building SOLR yourself with the attached 
patch. Which we tested with 7.5 also in production.
https://issues.apache.org/jira/browse/SOLR-13126
"0001-use-deprecated-classes-to-fix-regression-introduced-.patch"

I was not able to rewrite the multiplicative boost queries in a way that did 
not trigger the regression.

Best regards



RE: Document Score seen in debug section and in main results section dont match

2019-02-13 Thread Tobias Ibounig
Hi Erick,

> You are saying that "X" doesn't work, in this case the scores are different 
> in the debug section. But this implies that there is a problem "Y" that 
> you're having. 

The issue is not that they don't match, the issue is that the calculated score 
is wrong. The debug score shows everything as it should be, however the 
calculated score is only equal if none or all of the multiplicative boost 
queries match.

The jira issue has a misleading name. There is a testcase attached to reproduce 
the issue.

Best regards


Re: Delete by id

2019-02-13 Thread Matt Pearce

Hi Dwane,

The error suggests that Solr is trying to add a document, rather than 
delete one, and is complaining that the DOC_ID is missing.


I tried each of your examples (without the smart quotes), and they all 
worked as expected, both from curl and the admin UI. There's an error in 
your longhand example, which should read

{ "delete": { "id": "123!12345" }}
However, even using your example, I didn't get a complaint about the 
field being missing.


Using curl, my command was:
curl -XPOST -H 'Content-type: application/json' 
http://localhost:8983/solr/testCollection/update -d '{ "delete": 
"123!12345" }'


Are you doing anything differently from that?

Thanks,
Matt


On 11/02/2019 23:24, Dwane Hall wrote:

Hey Solr community,

I’m having an issue deleting documents from my Solr index and am seeking some 
community advice when somebody gets a spare minute. It seems really like a 
really simple problem …a requirement to delete a document by its id.

Here’s how my documents are mapped in solr

DOC_ID


My json format to delete the document (all looks correct according to 
https://lucene.apache.org/solr/guide/7_6/uploading-data-with-index-handlers.html
 “The JSON update format allows for a simple delete-by-id. The value of a 
delete can be an array which contains a list of zero or more specific document 
id’s (not a range) to be deleted. For example, a single document”)

Attempt 1 – “shorthand”
{“delete”:”123!12345”}

Attempt 2 – “longhand”
{“delete”:“DOC_ID”:”123!12345”}
{“delete”:{“DOC_ID”:”123!12345”}}

..the error is the same in all instances “org.apache.solr.common.SolrException: 
Document is missing mandatory uniqueKey field: DOC_ID”

Can anyone see any obvious details I’m overlooking?

I’ve tried all the update handlers below (both curl and through admin ui)

/update/
/update/json
/update/json/docs

My environment
Solr cloud 7.6
Single node

As always any advice would be greatly appreciated,

Thanks,

Dwane



--
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk


Re: misteriuos nullpointerexception while adding documents

2019-02-13 Thread Danilo Tomasoni

I changed the schema, but I deleted all the documents and tried a reindex.

I also tried deleting the core and re-adding it.

The autocommit is disabled because hard commits are controlled in the 
client-side.


I also tried with a plain solr installation (just unpack solr and copy 
the index folder), and in this way it works.


In my production environment, If I create a new core with exactly the 
same configuration it doesn't ( I mean, after a couple of POST the error 
I sent you appears in the logs and the core is stopped and reloaded).


Any clue?

Thank you for your suggestions

Danilo

On 12/02/19 18:09, Erick Erickson wrote:

bq. I disabled autocommit (both soft and hard), but used to work with a
previous version of the schema.


First, did you _change_ the schema without
1> deleting all the docs in the index
2> reindexing everything

or better, indexing to a new collection and aliasing to it?

If you changed the schema and continued indexing to the
collection _at any time in the past_ you may have
inconsistent merged segments. The only way to fix this
is to re-index, and I strongly recommend to a new
collection.

Second:
do _not_ disable, commits and index for a long time.
1> your tlog will grow forever until you do a hard commit
2> certain internal Solr structures grow until a new searcher
 is opened, either soft commit or hard-commit-with-opensearcher-true

https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Tue, Feb 12, 2019 at 1:03 AM MUNENDRA S.N  wrote:

Are you trying to set some field to null in the request?? Also, is that
particular field numeric, doc valued enabled and stored set to false??
Sharing more details would help here, specifically update request and
schema for those fields.

Regards,
Munendra S N


On Tue, Feb 12, 2019 at 2:24 PM Danilo Tomasoni  wrote:


Hello all,

I get this error while uploading my documents with 'set' modifier in
json format.

My solr version is 7.3.1.

I disabled autocommit (both soft and hard), but used to work with a
previous version of the schema.

Someone have any clue on what's going on here?

I can't reproduce the issue indexing locally on another solr 7.3.1
instance, I get this error only in the production instance.

Thank you

Danilo


 This is the error I get from client 


{
"responseHeader":{
  "status":500,
  "QTime":33},
"error":{
  "trace":"

java.lang.NullPointerException
  at org.apache.solr.update.UpdateLog.lookup(UpdateLog.java:971)
  at

org.apache.solr.handler.component.RealTimeGetComponent.getInputDocumentFromTlog(RealTimeGetComponent.java:537)
  at

org.apache.solr.handler.component.RealTimeGetComponent.getInputDocument(RealTimeGetComponent.java:617)
  at

org.apache.solr.handler.component.RealTimeGetComponent.getInputDocument(RealTimeGetComponent.java:594)
  at

org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:1330)
  at

org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1049)
  at

org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:633)
  at

org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
  at

org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
  at

org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
  at

org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:501)
  at

org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:145)
  at

org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:121)
  at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:84)
  at

org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
  at

org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
  at

org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
  at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711)
  at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:517)
  at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
  at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
  at

org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)
  at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
  at


Terms Query Parser: filtering on null and strings with whitespace.

2019-02-13 Thread Andreas Lønes
I am experiencing some weird behaviour when using terms query parser where I am 
filtering on documents that has no value for a given field(null) and strings 
with whitespaces.

I can filter on documents not having a value OR having some specific values for 
the field as long as the value does not have a whitespace(example 1). I can 
also filter on specific values with whitespace)(example 2).
What I am not able to make work is to filter on documents not having a value OR 
certain terms with whitespace(example 3)

These work:
1. (*:* AND -department_shortname:[* TO *]) OR {!tag=department_shortname 
terms f=department_shortname}BARN,KIR
2. {!tag=department_name terms f=department_name}Kirurgisk avdeling
Does not work:
3. (*:* AND -department_name:[* TO *]) OR {!tag=department_name terms 
f=department_name}Kirurgisk avdeling

The configuration of the fields if that is of any value:



So far, the only solution I can come up with is to index some kind of value 
that represents null, but as of my understanding it is recommended that null is 
not indexed.


Thanks,
Andreas


Incorrect shard placement during Collection creation in 7.6

2019-02-13 Thread Bram Van Dam
Hey folks,

TL;DR; createNodeSet & shards combination is not being respected.

I'm attempting to create a collection with multiple shards, but
apparently the value of createNodeSet is not being respected and shards
are being assigned to nodes seemingly at random.

createNodeSet.shuffle is set to false, so that's not the cause.
Furthermore, sometimes not all nodes in the request are used.

Here's my request, cleaned up for legibility. Note that the node names
are IP addresses but I've removed the first 3 octets for legibility.


admin/collections
?action=CREATE
=collectionName
=implicit
=collectionName1,collectionName2,collectionName3,collectionName4,collectionName5,collectionName6
=1024
=some_config
=171:8180_solr,172:8180_solr,173:8180_solr,177:8180_solr,179:8180_solr,179:8180_solr
=false
=true

Note that I'm creating a collection with 6 shards across 5 nodes.

Requested:
collectionName1: 171:8180_solr
collectionName2: 172:8180_solr
collectionName3: 173:8180_solr
collectionName4: 177:8180_solr
collectionName5: 179:8180_solr
collectionName6: 179:8180_solr

Actual:
collectionName1: 177:8180_solr
collectionName2: 172:8180_solr
collectionName3: 179:8180_solr
collectionName4: 173:8180_solr
collectionName5: 171:8180_solr
collectionName6: 171:8180_solr

Not a single shard ends up on the requested node.

Additionally, when the response comes back, it only contained
information about 5 of the 6 created cores (even though 6 were created).
Possibly because there are only 5 nodes?

Am I misunderstanding the way this is supposed to work? Or did I stumble
upon a bug? Should I attempt to create a collection without shards and
add them one at a time for better control?

Sidenote: having control over which shard lives where is a business
requirement, so leaving Solr to its own devices is, sadly, not an option
in this case :-(

Thanks a bunch,

 - Bram