Re: When will be solr 7.1 released?

2017-09-25 Thread Erick Erickson
In a word "no". Basically whenever a committer feels like there are
enough changes to warrant spinning a new version, they volunteer.
Nobody has stepped up to do that yet, although I expect it to be in
the next 2-3 months, but that's only a guess.

Best,
Erick

On Mon, Sep 25, 2017 at 5:21 PM, Nawab Zada Asad Iqbal  wrote:
> Hi,
>
> How are the release dates decided for new versions, are they known in
> advance?
>
> Thanks
> Nawab


Re: Question on SOLR join query

2017-09-25 Thread Erick Erickson
First of all, Solr is a _search_ engine, it wasn't built to be an
RDBMS. Whenever I see this question (paraphrasing) "I've indexed my
tables and want to use Solr just like a DB" I cringe.

The join performance goes up with the number of unique values for the
join field. High-cardinality fields are the poorer performing type,
which is what "initial join implementation is O(nterms)" is
expressing.

Have you considered either denormalizing the data or using block joins?

Best,
Erick

On Mon, Sep 25, 2017 at 6:59 PM, Jaimin Patel  wrote:
> I am facing a performance problem and could narrow it down to a join query
> that we are using. The join is on a unique field.
>
> We have a person profile stored in RDB in a relational way. Like person
> name table , address table etc. SOLR indexes are build using this RDB
> data,Each children is stored as separate document with parent's unique id.
> At query time , unique id of parent is joined with same in child
> documents({!join
> to=par_id from=par_id }) to allow search with AND condition for search
> terms involving children data
>
> I am reading similar issues and it says "initial join implementation is
> O(nterms)".. What does this mean ? I could not find any reference
> explaining meaning of 0 num_terms_in_field.
>
>  Regards,
> Jai


Question on SOLR join query

2017-09-25 Thread Jaimin Patel
I am facing a performance problem and could narrow it down to a join query
that we are using. The join is on a unique field.

We have a person profile stored in RDB in a relational way. Like person
name table , address table etc. SOLR indexes are build using this RDB
data,Each children is stored as separate document with parent's unique id.
At query time , unique id of parent is joined with same in child
documents({!join
to=par_id from=par_id }) to allow search with AND condition for search
terms involving children data

I am reading similar issues and it says "initial join implementation is
O(nterms)".. What does this mean ? I could not find any reference
explaining meaning of 0 num_terms_in_field.

 Regards,
Jai


When will be solr 7.1 released?

2017-09-25 Thread Nawab Zada Asad Iqbal
Hi,

How are the release dates decided for new versions, are they known in
advance?

Thanks
Nawab


Re: Contributors Group

2017-09-25 Thread Erick Erickson
Done.

On Mon, Sep 25, 2017 at 12:37 PM, Justin Baynton
 wrote:
> Hello There. Can you please add the following user to the contributors
> group:
>
> JustinBaynton
>
> Thank you!
>
> Justin


DocValues, Long and SolrJ

2017-09-25 Thread Phil Scadden
I ran into a problem with indexing documents which I worked around by changing 
data type, but I am curious as to how the setup could be made to work.

Solr 6.5.1 - Field type Long, multivalued false, DocValues.

In indexing with Solr, I set the value of field with:
Long accessLevel
...
accessLevel = qury.val(1);
...
Document.addField("access", accessLevel);

Solr fails to add the document with this message:

"cannot change DocValues type from SORTED_SET to NUMERIC for field"

??? So how do you configure a single-valued Long type?
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Contributors Group

2017-09-25 Thread Justin Baynton
Hello There. Can you please add the following user to the contributors
group:

JustinBaynton

Thank you!

Justin


Re: Possible memory leak with VersionBucket objects

2017-09-25 Thread Sundeep T
Sorry, I meant we are "not" running Solr in cloud mode

On Mon, Sep 25, 2017 at 1:29 PM, Sundeep T  wrote:

> Yes, but that issue seems specific to SolrCloud like I mentioned. We are
> running Solr in cloud mode and don't have Zookeeper configured
>
> Thanks
> Sundeep
>
> On Mon, Sep 25, 2017 at 12:52 PM, Steve Rowe  wrote:
>
>> Hi Sundeep,
>>
>> This looks to me like 
>> / , which was fixed in
>> Solr 7.0.
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On Sep 25, 2017, at 2:42 PM, Sundeep T  wrote:
>> >
>> > Hello,
>> >
>> > We are running our solr 6.4.2 instance on a single node without
>> zookeeper. So, we are not using solr cloud. We have been ingesting about
>> 50k messages per second into this instance spread over 4 cores.
>> >
>> > When we looked at the heapdump we see that it has there are around 385
>> million instances of VersionBucket objects taking about 8gb memory. This
>> number seems to grow based on the number of cores into which we are
>> ingesting data into.PFA a screen cap of heap recording.
>> >
>> > Browsing through the jira list we saw a similar issue -
>> https://issues.apache.org/jira/browse/SOLR-9803
>> >
>> > This issue is recently resolved by Erick. But this issue seems be
>> specifically tied to SolrCloud mode and Zookeeper. We are not using any of
>> these.
>> >
>> > So, we are thinking this could be another issue. Any one has ideas on
>> what this could be and if there is a fix for it?
>> >
>> > Thanks
>> > Sundeep
>>
>>
>


Re: Possible memory leak with VersionBucket objects

2017-09-25 Thread Sundeep T
Yes, but that issue seems specific to SolrCloud like I mentioned. We are
running Solr in cloud mode and don't have Zookeeper configured

Thanks
Sundeep

On Mon, Sep 25, 2017 at 12:52 PM, Steve Rowe  wrote:

> Hi Sundeep,
>
> This looks to me like  /
> , which was fixed in
> Solr 7.0.
>
> --
> Steve
> www.lucidworks.com
>
> > On Sep 25, 2017, at 2:42 PM, Sundeep T  wrote:
> >
> > Hello,
> >
> > We are running our solr 6.4.2 instance on a single node without
> zookeeper. So, we are not using solr cloud. We have been ingesting about
> 50k messages per second into this instance spread over 4 cores.
> >
> > When we looked at the heapdump we see that it has there are around 385
> million instances of VersionBucket objects taking about 8gb memory. This
> number seems to grow based on the number of cores into which we are
> ingesting data into.PFA a screen cap of heap recording.
> >
> > Browsing through the jira list we saw a similar issue -
> https://issues.apache.org/jira/browse/SOLR-9803
> >
> > This issue is recently resolved by Erick. But this issue seems be
> specifically tied to SolrCloud mode and Zookeeper. We are not using any of
> these.
> >
> > So, we are thinking this could be another issue. Any one has ideas on
> what this could be and if there is a fix for it?
> >
> > Thanks
> > Sundeep
>
>


Re: Possible memory leak with VersionBucket objects

2017-09-25 Thread Steve Rowe
Hi Sundeep,

This looks to me like  / 
, which was fixed in Solr 7.0.

--
Steve
www.lucidworks.com

> On Sep 25, 2017, at 2:42 PM, Sundeep T  wrote:
> 
> Hello,
> 
> We are running our solr 6.4.2 instance on a single node without zookeeper. 
> So, we are not using solr cloud. We have been ingesting about 50k messages 
> per second into this instance spread over 4 cores. 
> 
> When we looked at the heapdump we see that it has there are around 385 
> million instances of VersionBucket objects taking about 8gb memory. This 
> number seems to grow based on the number of cores into which we are ingesting 
> data into.PFA a screen cap of heap recording.
> 
> Browsing through the jira list we saw a similar issue 
> -https://issues.apache.org/jira/browse/SOLR-9803
> 
> This issue is recently resolved by Erick. But this issue seems be 
> specifically tied to SolrCloud mode and Zookeeper. We are not using any of 
> these.
> 
> So, we are thinking this could be another issue. Any one has ideas on what 
> this could be and if there is a fix for it?
> 
> Thanks
> Sundeep



Re: Seeing very low ingestion performance for a single non-cloud Solr core

2017-09-25 Thread saiks
Hi All,

Thanks for the response.

Increasing hard/soft commit intervals did not help.
But by changing "text" field in the ingestion input from the same message to
random messages of similar length gave 60% improved performance.

Im able to ingest 40k - 45k messages per second, earlier I did 26k.

Thanks a lot.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Possible memory leak with VersionBucket objects

2017-09-25 Thread Sundeep T
Hello,

We are running our solr 6.4.2 instance on a single node without zookeeper.
So, we are not using solr cloud. We have been ingesting about 50k messages
per second into this instance spread over 4 cores.

When we looked at the heapdump we see that it has there are around 385
million instances of VersionBucket objects taking about 8gb memory. This
number seems to grow based on the number of cores into which we are
ingesting data into.PFA a screen cap of heap recording.

Browsing through the jira list we saw a similar issue -
https://issues.apache.org/jira/browse/SOLR-9803

This issue is recently resolved by Erick. But this issue seems be
specifically tied to SolrCloud mode and Zookeeper. We are not using any of
these.

So, we are thinking this could be another issue. Any one has ideas on what
this could be and if there is a fix for it?

Thanks
Sundeep


Re: Archiving site not working

2017-09-25 Thread Erick Erickson
Downloads and unzips fine for me. What specifically do you mean by "not
working"? Didn't download? Downloaded and didn't unzip?

Best,
Erick

On Mon, Sep 25, 2017 at 11:21 AM, FUENTES MARTINEZ Alejandro <
alejandro.fuentes.consul...@axa.com.mx> wrote:

> Hi,
>
>
>
> We want to download *solr version 4.3.1* but the archiving site doesn’t
> work (https://archive.apache.org/dist/lucene/solr/4.3.1/solr-4.3.1.zip)
>
>
>
> Is there a mirror site to download it?
>
>
>
> We appreciate your help.
>
>
>
> Best Regards.
>
> --
>
> *Alejandro Fuentes Martínez *
>
> *AXA Technology Services- México*
>
> Stability Team Member
>
> Felix Cuevas 366 Piso 2 – A
>
> Col. Tlacoquemecatl Del. Benito Juarez,
>
> C.P.03200 México D.F.
>
> Ext. 3183
>
> Cel.: 55 2719 6425
>
> alejandro.fuentes.consul...@axa.com.mx
>
> [image: cid:image002.png@01D030DF.90E79240]
> [image:
> cid:image003.png@01D030DF.90E79240] 
> [image:
> cid:image004.png@01D030DF.90E79240] 
>
>
>
> [image: tech_p_rgb]
>
> [image: LOGO-axapolis.jpg]
>
>
>
> 
> 
> *
> La información contenida en este correo electrónico es confidencial y está
> legalmente protegida. Está dirigido solamente a la dirección de correo
> señalada. El acceso a este correo electrónico por cualquier otra persona,
> No está autorizado.
> Si Ud. no es el receptor deliberado de este correo electrónico, cualquier
> difusión, copia o distribución está prohibida y puede ser ilegal. Si lo ha
> recibido por error, por favor notifique al emisor e inmediatamente bórrelo
> de forma permanente y destruya cualquier copia impresa.
> En caso de que el correo esté dirigido a alguno de nuestros clientes, la
> opinión o recomendación contenida está sujeta a las condiciones
> regulatorias de AXA que resulten aplicables o a los acuerdos comerciales
> suscritos con el cliente.
> The information in this Internet e-mail is confidential and may be legally
> privileged. It is intended solely for the addressee(s). Access to this
> Internet e-mail by anyone else is unauthorized.
> If you are not the intended recipient of this e-mail, any disclosure,
> copying, or distribution of it is prohibited and may be unlawful. If you
> have received this e-mail in error, please notify the sender and
> immediately and permanently delete it and destroy any copies of it that
> were printed out. When addressed to our clients any opinions or advice
> contained in this Internet e-mail is subject to the terms and conditions
> expressed in any applicable governing AXA terms of business or client
> engagement letter.
>
> Aviso de Privacidad
> AXA Seguros S.A. de C.V., y AXA Salud, S.A. de C.V., en adelante “AXA”,
> con domicilio en Félix Cuevas 366, piso 6, Colonia Tlacoquemécatl,
> Delegación Benito Juárez, C.P. 03200, en la Ciudad de México, Distrito
> Federal, le informa que en caso de que usted haya proporcionado algún dato
> personal a las compañías mencionadas, sus datos serán tratados únicamente
> para los fines de la relación jurídica para los cuales fueron
> proporcionados. Usted podrá conocer ampliamente el Aviso de Privacidad
> vigente en www.axa.mx, a través de comunicados colocados en nuestras
> oficinas, a través de vía sonora o mediante cualquier medio de comunicación
> que tengamos con usted. Le recomendamos visite la página de Internet antes
> citada frecuentemente.
> Privacy Notice
> AXA Seguros SA de CV, and AXA Salud, SA. de CV, hereinafter "AXA",
> domiciled in Félix Cuevas 366, floor 6, Colonia Tlacoquemécatl, Delegación
> Benito Juárez, C.P. 03200, en la Ciudad de México, Distrito Federal,
> informs you that any personal information provided by you to the mentioned
> companies will be processed solely for the purposes of the legal
> relationship for which they were provided. You may know the full Privacy
> Notice in force at www.axa.mx, through communications placed in our
> offices, through sound recordings or by any other means of communication
> that we have in place. We recommend you to visit frequently the website
> cited.
>
>


Archiving site not working

2017-09-25 Thread FUENTES MARTINEZ Alejandro
Hi,

We want to download solr version 4.3.1 but the archiving site doesn't work 
(https://archive.apache.org/dist/lucene/solr/4.3.1/solr-4.3.1.zip)

Is there a mirror site to download it?

We appreciate your help.

Best Regards.
--
Alejandro Fuentes Martínez
AXA Technology Services- México
Stability Team Member
Felix Cuevas 366 Piso 2 - A
Col. Tlacoquemecatl Del. Benito Juarez,
C.P.03200 México D.F.
Ext. 3183
Cel.: 55 2719 6425
alejandro.fuentes.consul...@axa.com.mx
[cid:image002.png@01D030DF.90E79240][cid:image003.png@01D030DF.90E79240][cid:image004.png@01D030DF.90E79240]

[tech_p_rgb]
[LOGO-axapolis.jpg]

*
La información contenida en este correo electrónico es confidencial y está 
legalmente protegida. Está dirigido solamente a la dirección de correo 
señalada. El acceso a este correo electrónico por cualquier otra persona, No 
está autorizado.
Si Ud. no es el receptor deliberado de este correo electrónico, cualquier 
difusión, copia o distribución está prohibida y puede ser ilegal. Si lo ha 
recibido por error, por favor notifique al emisor e inmediatamente bórrelo de 
forma permanente y destruya cualquier copia impresa.
En caso de que el correo esté dirigido a alguno de nuestros clientes, la 
opinión o recomendación contenida está sujeta a las condiciones regulatorias de 
AXA que resulten aplicables o a los acuerdos comerciales suscritos con el 
cliente.
The information in this Internet e-mail is confidential and may be legally 
privileged. It is intended solely for the addressee(s). Access to this Internet 
e-mail by anyone else is unauthorized.
If you are not the intended recipient of this e-mail, any disclosure, copying, 
or distribution of it is prohibited and may be unlawful. If you have received 
this e-mail in error, please notify the sender and immediately and permanently 
delete it and destroy any copies of it that were printed out. When addressed to 
our clients any opinions or advice contained in this Internet e-mail is subject 
to the terms and conditions expressed in any applicable governing AXA terms of 
business or client engagement letter.

Aviso de Privacidad
AXA Seguros S.A. de C.V., y AXA Salud, S.A. de C.V., en adelante “AXA”, con 
domicilio en Félix Cuevas 366, piso 6, Colonia Tlacoquemécatl, Delegación 
Benito Juárez, C.P. 03200, en la Ciudad de México, Distrito Federal, le informa 
que en caso de que usted haya proporcionado algún dato personal a las compañías 
mencionadas, sus datos serán tratados únicamente para los fines de la relación 
jurídica para los cuales fueron proporcionados. Usted podrá conocer ampliamente 
el Aviso de Privacidad vigente en www.axa.mx, a través de comunicados colocados 
en nuestras oficinas, a través de vía sonora o mediante cualquier medio de 
comunicación que tengamos con usted. Le recomendamos visite la página de 
Internet antes citada frecuentemente.
Privacy Notice
AXA Seguros SA de CV, and AXA Salud, SA. de CV, hereinafter "AXA", domiciled in 
Félix Cuevas 366, floor 6, Colonia Tlacoquemécatl, Delegación Benito Juárez, 
C.P. 03200, en la Ciudad de México, Distrito Federal, informs you that any 
personal information provided by you to the mentioned companies will be 
processed solely for the purposes of the legal relationship for which they were 
provided. You may know the full Privacy Notice in force at www.axa.mx, through 
communications placed in our offices, through sound recordings or by any other 
means of communication that we have in place. We recommend you to visit 
frequently the website cited.

RE: Solr fields for Microsoft files, image files, PDF, text files

2017-09-25 Thread Allison, Timothy B.
bq: How do I get a list of all valid field names based on the file type

bq: You don't. At least I've never found any. Plus various document formats 
will allow custom meta-data fields so there's no definitive list.

It would be trivial to add field counts per mime to tika-eval.  If you're 
interested in this, please open a ticket on Tika's JIRA.


Re: Solr fields for Microsoft files, image files, PDF, text files

2017-09-25 Thread Erik Hatcher
Phillip - You may be interested to start with the example/files that ships with 
Solr.   It is specifically designed as a configuration (and UI!) that deals 
with indexing rich files with a bit more than other examples - it pulls out 
acronyms, e-mail addresses, and URLs from text, as well as what you’ve asked 
about, mapping content types to more friendly human types (“image” instead of 
the whole gamut of image/* content-types).

Erik

> On Sep 24, 2017, at 10:55 PM, Phillip Wu  wrote:
> 
> 
> Hi,
> I'm starting out with Solr on a Windows box.
> 
> I want to index the following documents:
> doc;docx
> xls;xlsx
> ppt
> vsd
> 
> pdf
> txt
> 
> gif;jpeg;tiff
> 
> I undersand that solr uses Apache Tika to read these file types and return an 
> xml stream back to Solr.
> For Tika image processing, I've loaded Tesseract.
> 
> To be able to search the documents, I need to define "fields" in a file 
> called meta-schema.
> 
> How do I get a list of all valid field names based on the file type? For 
> example *.doc, what "fields" exist so I choose what to store?
> 
> I'm assuming that for example, *.doc files there is metadata put into the 
> file by Microsoft Word eg.author,date and "free form" text.
> 
> So where is the list of valid fields per file type?
> 
> Also how do I search the "free form" text for a word/pattern in the Solr 
> search tool?
> 
> 
> 
> 



Re: Solr fields for Microsoft files, image files, PDF, text files

2017-09-25 Thread Erick Erickson
bq: How do I get a list of all valid field names based on the file type

You don't. At least I've never found any. Plus various document
formats will allow custom meta-data fields so there's no definitive
list.

bq: Also how do I search the "free form" text for a word/pattern in
the Solr search tool?

you put the extracted text (as opposed to meta-data) into an analyzed
field and search that.



NOTE: Solr is a search engine. The closest thing to an OOB "Solr
Search Tool" is the admin UI, which isn't intended to be an end-user
facing app.

Here's some SolrJ code that'll let you explore the meta-data fields in
various document types:

https://lucidworks.com/2012/02/14/indexing-with-solrj/

You can pull out the RDBMS bits pretty easily.

Best,
Erick

On Sun, Sep 24, 2017 at 7:55 PM, Phillip Wu  wrote:
>
>  Hi,
> I'm starting out with Solr on a Windows box.
>
> I want to index the following documents:
> doc;docx
> xls;xlsx
> ppt
> vsd
>
> pdf
> txt
>
> gif;jpeg;tiff
>
> I undersand that solr uses Apache Tika to read these file types and return an 
> xml stream back to Solr.
> For Tika image processing, I've loaded Tesseract.
>
> To be able to search the documents, I need to define "fields" in a file 
> called meta-schema.
>
> How do I get a list of all valid field names based on the file type? For 
> example *.doc, what "fields" exist so I choose what to store?
>
> I'm assuming that for example, *.doc files there is metadata put into the 
> file by Microsoft Word eg.author,date and "free form" text.
>
> So where is the list of valid fields per file type?
>
> Also how do I search the "free form" text for a word/pattern in the Solr 
> search tool?
>
>
>
>


Re: Replication on startup takes a long time

2017-09-25 Thread Erick Erickson
Emir:

OK, thanks for pointing that out, that relieves me a lot!

Erick

On Mon, Sep 25, 2017 at 1:03 AM, Emir Arnautović
 wrote:
> Hi Eric,
> I don’t think that there are some bugs with searcher reopening - this is a 
> scenario with a new slave:
>
> “But when I add a *new* slave pointing to the master…”
>
> So expected to have zero results until replication finishes.
>
> Regards,
> Emir
>
>> On 23 Sep 2017, at 19:21, Erick Erickson  wrote:
>>
>> First I'd like to say that I wish more people would take the time like
>> you have to fully describe the problem and your observations, it makes
>> it s much nicer than having half-a-dozen back and forths! Thanks!
>>
>> Just so it doesn't get buried in the rest of the response, I do tend
>> to go on I suspect you have a suggester configured. The
>> index-based suggesters read through your _entire_ index, all the
>> stored fields from all the documents and process them into an FST or
>> "sidecar" index. See:
>> https://lucidworks.com/2015/03/04/solr-suggester/. If this is true
>> they might be being built on the slaves whenever a replication
>> happens. Hmmm, if this is true, let us know. You can tell by removing
>> the suggester from the config and timing again. It seems like in the
>> master/slave config we should copy these down but don't know if it's
>> been tested.
>>
>> If they are being built on the slaves, you might try commenting out
>> all of the buildOn bits on the slave configurations. Frankly I
>> don't know if building the suggester structures on the master would
>> propagate them to the slave correctly if the slave doesn't build them,
>> but it would certainly be a fat clue if it changed the load time on
>> the slaves and we could look some more at options.
>>
>> Observation 1: Allocating 40G of memory for an index only 12G seems
>> like overkill. This isn't the root of your problem, but a 12G index
>> shouldn't need near 40G of JVM. In fact, due to MMapDirectory being
>> used (see Uwe Schindler's blog here:
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html)
>> I'd guess you can get away with MUCH less memory, maybe as low as 8G
>> or so. The wildcard here would be the size of your caches, especially
>> your filterCache configured in solrconfig.xml. Like I mentioned, this
>> isn't the root of your replication issue, just sayin'.
>>
>> Observation 2: Hard commits (the  setting is not a very
>> expensive operation with openSearcher=false. Again this isn't the root
>> of your problem but consider removing the number of docs limitation
>> and just making it time-based, say every minute. Long blog on the
>> topic here: 
>> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/.
>> You might be accumulating pretty large transaction logs (assuming you
>> haven't disabled them) to no good purpose. Given your observation that
>> the actual transmission of the index takes 2 minutes, this is probably
>> not something to worry about much, but is worth checking.
>>
>> Question 1:
>>
>> Solr should be doing nothing other than opening a new searcher, which
>> should be roughly the "autowarm" time on master plus (perhaps)
>> suggester build. Your observation that autowarming takes quite a bit
>> of time (evidenced by much shorter times when you set the counts to
>> zero) is a smoking gun that you're probably doing far too much
>> autowarming. HOWEVER, during this interval the replica should be
>> serving queries from the old searcher so something else is going on
>> here. Autowarming is actually pretty simple, perhaps this will help
>> you to keep in mind while tuning:
>>
>> The queryResultCache and filterCache are essentially maps where the
>> key is just the text of the clause (simplifying here). So for the
>> queryResultCache the key is the entire search request. For the
>> filterCache, the key is just the "fq" clause. autowarm count in each
>> just means the number of keys that are replayed when a new searcher is
>> opened. I usually start with a pretty small number, on the order of
>> 10-20. The purpose of them is just to keep from experiencing a delay
>> when the first few searches are performed after a searcher is opened.
>>
>> My bet: you won't notice a measurable difference when dropping the
>> atuowarm counts drastically in terms of query response, but you will
>> save the startup time. I also suspect you can reduce the size of the
>> caches drastically, but don't know what you have them set to, it's a
>> guess.
>>
>> As to what's happening such that you serve queries with zero counts,
>> my best guess at this point is that you are rebuilding
>> autosuggesters. We shouldn't be serving queries from the new
>> searcher during this interval, if confirmed we need to raise a JIRA.
>>
>> Question 2: see above, autosuggester?
>>
>> Question 3a: documents should become searchable on the slave when 1>
>> all the segments 

Re: Solr 5.5.2 - Custom Function Query update

2017-09-25 Thread Susheel Kumar
ignore solr version...

On Mon, Sep 25, 2017 at 11:21 AM, Susheel Kumar 
wrote:

> Check if your jar is present at solr-6.0.0/server/solr//lib/ or do
> a find under solr directory...
>
> On Mon, Sep 25, 2017 at 9:59 AM, Florian Le Vern  > wrote:
>
>> Hi,
>>
>> I added a custom Function Query in a jar library that is loaded from the
>> `solr/data/lib` folder (same level as the cores) with the solrconfig line:
>> > class="blah.blah.solr.search.function.MyFuncValueParser"
>> />
>>
>> I just updated this lib but after restarting Solr, it seems that it
>> still uses the previous version.
>> I also tried to delete the lib from the `solr/data/lib` folder without
>> changing the solrconfig but it was still working.
>>
>> Do you have any clues for updating a custom lib ?
>>
>> Thanks in advance,
>> Florian
>>
>>
>


Re: Solr 5.5.2 - Custom Function Query update

2017-09-25 Thread Susheel Kumar
Check if your jar is present at solr-6.0.0/server/solr//lib/ or do a
find under solr directory...

On Mon, Sep 25, 2017 at 9:59 AM, Florian Le Vern 
wrote:

> Hi,
>
> I added a custom Function Query in a jar library that is loaded from the
> `solr/data/lib` folder (same level as the cores) with the solrconfig line:
>  class="blah.blah.solr.search.function.MyFuncValueParser"
> />
>
> I just updated this lib but after restarting Solr, it seems that it
> still uses the previous version.
> I also tried to delete the lib from the `solr/data/lib` folder without
> changing the solrconfig but it was still working.
>
> Do you have any clues for updating a custom lib ?
>
> Thanks in advance,
> Florian
>
>


Re: Solr 6 CDCR does not work

2017-09-25 Thread 浩伦 严
Hi Uwe,

Here is 09/24/2017,  recently I am trying to set up my own CDCR, bounced into 
this situation also. Means I have done all the configurations and all the 
status were fine when I “START” and “STATUS” but nothing have pushed into 
target center, so I googled then I found this one, it helps me a lot!

I think he means ENABLE the auto commit. Cause I have tried to disable it, as 
well as the soft commit, it will not work. But then I tried to enable them, the 
changes I made to my source data center showed up in the target center.

Hope my response could helped you.

Jiani

Solr 5.5.2 - Custom Function Query update

2017-09-25 Thread Florian Le Vern

Hi,

I added a custom Function Query in a jar library that is loaded from the 
`solr/data/lib` folder (same level as the cores) with the solrconfig line:
class="blah.blah.solr.search.function.MyFuncValueParser" />


I just updated this lib but after restarting Solr, it seems that it 
still uses the previous version.
I also tried to delete the lib from the `solr/data/lib` folder without 
changing the solrconfig but it was still working.


Do you have any clues for updating a custom lib ?

Thanks in advance,
Florian



Solr 5.5.2 - Custom Function Query update

2017-09-25 Thread Florian Le Vern

Hi,

I added a custom Function Query in a jar library that is loaded from the
`solr/data/lib` folder (same level as the cores) with the solrconfig line:


I just updated this lib but after restarting Solr, it seems that it
still uses the previous version.
I also tried to delete the lib from the `solr/data/lib` folder without
changing the solrconfig but it was still working.

Do you have any clues for updating a custom lib ?

Thanks in advance,
Florian



Problem with live Solr cloud (6.6) backup using collection API

2017-09-25 Thread Vikas Mehra
Cluster has 1 zookeeper node and 3 solr nodes. There is only one collection
with 3 shards. Data is continuously indexed using SolrJ API. System is
running on AWS and I am taking backup on EFS (Elastic File System).

Observed behavior:
If indexing is not in progress, I take a backup of cluster using collection
API, backup succeeds and restore works as expected.

snapshotscli.sh works as expected if I first take snapshot of index while
indexing is in progress and then take backup. There is no error during
restore.

However, I get error most of the time if I try to restore collection from
the backup taken using collection API when indexing was still in progress.
Error is always missing segment and I can see that segment its trying to
read during restore does not exist in the backup shard directory.

Also, Is there a way to take snapshot of solr cloud using collection api?
User guide only has documentation to take snapshot of core using collection
api.

2017-09-08 19:47:22.592 WARN
(parallelCoreAdminExecutor-5-thread-8-processing-n:ec2-34-201-149-27.compute-1.amazonaws.com:8983_solr
t1cloudbackuponefs-r2187461299681393 RESTORECORE) [   ] o.a.s.h.RestoreCore
Could not switch to restored index. Rolling back to the current index
org.apache.lucene.index.CorruptIndexException: Unexpected file read error
while reading index.
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="/var/solr/data/t1cloud3_shard2_replica0/data/restore.20170908194722131/segments_y")))
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:930)
at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:118)
at
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:93)
at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:248)
at
org.apache.solr.update.DefaultSolrCoreState.changeWriter(DefaultSolrCoreState.java:211)
at
org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java:220)
at
org.apache.solr.update.DirectUpdateHandler2.newIndexWriter(DirectUpdateHandler2.java:726)
at org.apache.solr.handler.RestoreCore.doRestore(RestoreCore.java:108)
at
org.apache.solr.handler.admin.RestoreCoreOp.execute(RestoreCoreOp.java:65)
at
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:384)
at
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388)
at
org.apache.solr.handler.admin.CoreAdminHandler.lambda$handleRequestBody$0(CoreAdminHandler.java:182)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.file.NoSuchFileException:
/var/solr/data/t1cloud3_shard2_replica0/data/restore.20170908194722131/_
4m.si
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:335)
at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
at
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:192)
at
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
at
org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
... 17 more


Re: Solr SQL: standalone mode

2017-09-25 Thread Joel Bernstein
It's the automatic node discovery provided by ZooKeeper. If you setup a
single node SolrCloud it will work fine.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Sep 25, 2017 at 3:09 AM, Pavel Micka 
wrote:

> Glad to hear that. Btw: where is the limitation (that its not possible to
> run the SQL in standalone). Is it in the distribution algorithm itself, or
> is just Solr missing ZooKeeper storage. I am asking because if its the
> second case, we can just install single node ZK + single Solr and have a
> "non-distributed cloud" :-)
>
> Thanks,
> Pavel
>
> -Original Message-
> From: Joel Bernstein [mailto:joels...@gmail.com]
> Sent: Monday, September 25, 2017 3:04 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr SQL: standalone mode
>
> At Alfresco we are working on a version of Solr's SQL that works in
> non-Solr Cloud mode. The plan is to contribute this back to 7x branch.
> There will also be improvements to the SQL coverage committed back from
> Alfresco.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sun, Sep 24, 2017 at 6:04 PM, Pavel Micka 
> wrote:
>
> > Hi,
> >
> >
> > I read in the documentation that executing Solr SQL is possible only
> > in SolrCloud mode. The thing is that we have unfortunatelly some
> > installations, which simply can't have multiple nodes (too small
> > instances). Is it somehow possible to workaround this restriction or
> > is there at least any plan to lift it?
> >
> >
> > Thanks,
> >
> >
> > Pavel
> >
>


Re: overwrite the parameter query in DIH

2017-09-25 Thread Mikhail Khludnev
Hello,

I don't fully understand the question but you might need to check this
https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#dih-request-parameters


On Thu, Sep 21, 2017 at 6:17 PM, solr2020  wrote:

> Hi All,
>
> We are retrieving mongodb data using Dataimport handler. We have a scenario
> where we have to overwrite the mongodb query configured in data-config
> file.
> We have to do this overwrite programmatically using solrj. For this we are
> using ModifiableSolrParams to set the parameters. Here is the code snippet
> used to create a dataimport http request.
>
> String solrURL=
> "http://:/solr/collectionname";
> SolrClient solr = new HttpSolrClient.Builder(
> solrURL).build();
> ModifiableSolrParams params = new ModifiableSolrParams();
> params.set("qt", "/dataimport");
> params.set("command", "full-import");
> params.set("query=id:{ $in: ", idlist+ " }");
> QueryResponse response = solr.query(params);
>
> Here the expectation is it should use this query parameter value given in
> the code snippet instead of using the query parameter configured in
> data-config file.
>
> Is there a way to do this? .Please suggest.
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 
Sincerely yours
Mikhail Khludnev


Re: How to resolve Overlapping on DeckSearchers=2

2017-09-25 Thread Emir Arnautović
Hi Rubi,
As you probably know, in order to have changes visible, you have to reopen 
searcher. Opening searcher includes warming up searcher. What happens to you is 
that new commit happens while previous commit did not result in a new searcher. 
What you can do:
commit less frequently - it is good practice to do time based commits on Solr 
side, but if you prefer to keep commits on client side, you might consider 
using commitWithin option instead of explicit commits. Also, you mentioned 
slaves, you could increase pulling interval on slaves to avoid overlapping 
searchers.
use cold searcher
decrease autowarm time by decreasing autowarm counts or reducing number of 
autowarm queries.
increase max allowed number of deck searchers and ignore the warning

I would recommend addressing commit strategy and tuning autowarm counts/queries 
(decreasing counts and having couple of good autowarm queries does not have to 
have negative impact on query time).

HTH,
Emir

> On 25 Sep 2017, at 07:05, Rubi Hali  wrote:
> 
> Hi
> 
> I am using solr 6.1 version. Getting A continous warning in both master
> slaves as *Performance Warning: Overlapping on DeckSearchers=2 *
> 
> *Analyzed after the same warning on slaves replication starts throwing
> index fetch failed error as not able to open a new searcher*
> 
> *went through some online blogs or posts , where it says to not have
> frequent commits or increase warming searchers*
> 
> 
> *But increasing the warming searchers doesnt seem to be a good idea as that
> will impact my query performance*
> 
> *Commits are explicity made by my application. I have disabled any soft
> commit and autocommit in my solrconfig.xml by assigning value as -1 to the
> same.*
> 
> *Please help if any good suggestions to resolve the same*
> 
> *Thanks in advance!*



How to resolve Overlapping on DeckSearchers=2

2017-09-25 Thread Rubi Hali
Hi

I am using solr 6.1 version. Getting A continous warning in both master
slaves as *Performance Warning: Overlapping on DeckSearchers=2 *

*Analyzed after the same warning on slaves replication starts throwing
index fetch failed error as not able to open a new searcher*

*went through some online blogs or posts , where it says to not have
frequent commits or increase warming searchers*


*But increasing the warming searchers doesnt seem to be a good idea as that
will impact my query performance*

*Commits are explicity made by my application. I have disabled any soft
commit and autocommit in my solrconfig.xml by assigning value as -1 to the
same.*

*Please help if any good suggestions to resolve the same*

*Thanks in advance!*


Re: Replication on startup takes a long time

2017-09-25 Thread Emir Arnautović
Hi Eric,
I don’t think that there are some bugs with searcher reopening - this is a 
scenario with a new slave:

“But when I add a *new* slave pointing to the master…”

So expected to have zero results until replication finishes.

Regards,
Emir

> On 23 Sep 2017, at 19:21, Erick Erickson  wrote:
> 
> First I'd like to say that I wish more people would take the time like
> you have to fully describe the problem and your observations, it makes
> it s much nicer than having half-a-dozen back and forths! Thanks!
> 
> Just so it doesn't get buried in the rest of the response, I do tend
> to go on I suspect you have a suggester configured. The
> index-based suggesters read through your _entire_ index, all the
> stored fields from all the documents and process them into an FST or
> "sidecar" index. See:
> https://lucidworks.com/2015/03/04/solr-suggester/. If this is true
> they might be being built on the slaves whenever a replication
> happens. Hmmm, if this is true, let us know. You can tell by removing
> the suggester from the config and timing again. It seems like in the
> master/slave config we should copy these down but don't know if it's
> been tested.
> 
> If they are being built on the slaves, you might try commenting out
> all of the buildOn bits on the slave configurations. Frankly I
> don't know if building the suggester structures on the master would
> propagate them to the slave correctly if the slave doesn't build them,
> but it would certainly be a fat clue if it changed the load time on
> the slaves and we could look some more at options.
> 
> Observation 1: Allocating 40G of memory for an index only 12G seems
> like overkill. This isn't the root of your problem, but a 12G index
> shouldn't need near 40G of JVM. In fact, due to MMapDirectory being
> used (see Uwe Schindler's blog here:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html)
> I'd guess you can get away with MUCH less memory, maybe as low as 8G
> or so. The wildcard here would be the size of your caches, especially
> your filterCache configured in solrconfig.xml. Like I mentioned, this
> isn't the root of your replication issue, just sayin'.
> 
> Observation 2: Hard commits (the  setting is not a very
> expensive operation with openSearcher=false. Again this isn't the root
> of your problem but consider removing the number of docs limitation
> and just making it time-based, say every minute. Long blog on the
> topic here: 
> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/.
> You might be accumulating pretty large transaction logs (assuming you
> haven't disabled them) to no good purpose. Given your observation that
> the actual transmission of the index takes 2 minutes, this is probably
> not something to worry about much, but is worth checking.
> 
> Question 1:
> 
> Solr should be doing nothing other than opening a new searcher, which
> should be roughly the "autowarm" time on master plus (perhaps)
> suggester build. Your observation that autowarming takes quite a bit
> of time (evidenced by much shorter times when you set the counts to
> zero) is a smoking gun that you're probably doing far too much
> autowarming. HOWEVER, during this interval the replica should be
> serving queries from the old searcher so something else is going on
> here. Autowarming is actually pretty simple, perhaps this will help
> you to keep in mind while tuning:
> 
> The queryResultCache and filterCache are essentially maps where the
> key is just the text of the clause (simplifying here). So for the
> queryResultCache the key is the entire search request. For the
> filterCache, the key is just the "fq" clause. autowarm count in each
> just means the number of keys that are replayed when a new searcher is
> opened. I usually start with a pretty small number, on the order of
> 10-20. The purpose of them is just to keep from experiencing a delay
> when the first few searches are performed after a searcher is opened.
> 
> My bet: you won't notice a measurable difference when dropping the
> atuowarm counts drastically in terms of query response, but you will
> save the startup time. I also suspect you can reduce the size of the
> caches drastically, but don't know what you have them set to, it's a
> guess.
> 
> As to what's happening such that you serve queries with zero counts,
> my best guess at this point is that you are rebuilding
> autosuggesters. We shouldn't be serving queries from the new
> searcher during this interval, if confirmed we need to raise a JIRA.
> 
> Question 2: see above, autosuggester?
> 
> Question 3a: documents should become searchable on the slave when 1>
> all the segments are copied, 2> autowarm is completed. As above, the
> fact that you get 0-hit responses isn't what _should_ be happening.
> 
> Autocommit settings are pretty irrelevant on the slave.
> 
> Question 3b: soft commit on the master shouldn't affect the 

RE: Solr SQL: standalone mode

2017-09-25 Thread Pavel Micka
Glad to hear that. Btw: where is the limitation (that its not possible to run 
the SQL in standalone). Is it in the distribution algorithm itself, or is just 
Solr missing ZooKeeper storage. I am asking because if its the second case, we 
can just install single node ZK + single Solr and have a "non-distributed 
cloud" :-)

Thanks,
Pavel

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Monday, September 25, 2017 3:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr SQL: standalone mode

At Alfresco we are working on a version of Solr's SQL that works in non-Solr 
Cloud mode. The plan is to contribute this back to 7x branch.
There will also be improvements to the SQL coverage committed back from 
Alfresco.

Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Sep 24, 2017 at 6:04 PM, Pavel Micka 
wrote:

> Hi,
>
>
> I read in the documentation that executing Solr SQL is possible only 
> in SolrCloud mode. The thing is that we have unfortunatelly some 
> installations, which simply can't have multiple nodes (too small 
> instances). Is it somehow possible to workaround this restriction or 
> is there at least any plan to lift it?
>
>
> Thanks,
>
>
> Pavel
>


Re: AEM SOLR integaration

2017-09-25 Thread Tommaso Teofili
integrating can be done in AEM at different layers, however my suggestion
would be to enable that at the repository (Oak) level [1] so that usual AEM
search would also take ACLs into account.

[1] : http://jackrabbit.apache.org/oak/docs/query/solr.html

Il giorno ven 22 set 2017 alle ore 18:47 Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> ha scritto:

> Gunalan,
>
> I think this depends on your system environment.   It is a general
> "service discovery" issue.   On-premise, my organization uses f5 BigIP as a
> load balancer, and so we merely have f5 LTM direct traffic from one name to
> any of a number of Solr instances.   If they are all SolrCloud, it mostly
> just works.
>
> In AWS Cloud, the same thing could work with an Elastic Load Balancer
> (ELB) or Application Load Balancer (ALB), which is more flexible.
>
> AEM Solr Search appears to be for embedding search results into AEM,
> rather than to index AEM content in a structured manner.   These are two
> different but related features.   Which are you looking to do?
>
> Hope this helps,
>
> -Dan
>
> -Original Message-
> From: Gunalan V [mailto:visagan2...@gmail.com]
> Sent: Friday, September 22, 2017 7:32 AM
> To: solr-user@lucene.apache.org
> Subject: Re: AEM SOLR integaration
>
> Thank You!
>
> I was looking for some suggestions in building the SOLR infrastructure.
>
> Like how each AEM instance should point to ? Might be one AEM instance to
> one SOLR cloud (With internal zookeeper) in all environments or any
> specific architecture we need to follow while going with AEM.
>
>
>
> Thanks,
> GVK
>
> On Fri, Sep 22, 2017 at 02:58 Atita Arora  wrote:
>
> >
> > https://www.slideshare.net/DEEPAKKHETAWAT/basics-of-solr-and-solr-inte
> > gration-with-aem6-61150010
> >
> > This could probably help too along with the link Nicole shared.
> >
> > On Fri, Sep 22, 2017 at 12:28 PM, Nicole Bilić
> > 
> > wrote:
> >
> > > Hi,
> > >
> > > Maybe this could help you out http://www.aemsolrsearch.com/
> > >
> > > Regards,
> > > Nicole
> > >
> > > On Sep 22, 2017 05:41, "Gunalan V"  wrote:
> > >
> > > > Hello,
> > > >
> > > > I'm looking for suggestion in building the SOLR infrastructure so
> > Kindly
> > > > let me know if anyone has integerated AEM (Adobe Experience
> > > > Manager)
> > with
> > > > SOLR?
> > > >
> > > >
> > > >
> > > > Thanks,
> > > > GVK
> > > >
> > >
> >
>