Re: [Possible Bug] 5.5.0 Startup script ignoring host parameter?

2016-03-31 Thread Bram Van Dam
On 30/03/16 16:45, Shawn Heisey wrote:
> The host parameter does not control binding to network interfaces.  It
> controls what hostname is published to zookeeper when running in cloud mode.

Oh I see. That wasn't clear from the documentation. Might be worth
adding such a parameter to the startup script, in which case.

But for now I'll just edit the config file, thanks for the tip!

 - Bram



Complex Sort

2016-03-31 Thread ~$alpha`
I have a column in Mysql and I need to do a complex sorting in Solr.

Considering below column

|175#40|173#17|174#13|134#11|17#8|95#4|64#3|116#3|343#0|
where 175 indicates values to be matches and 40 implies score.

So if logged in user value is 175 he will give a score of 40 and if the
value is 173 he should get a score of 17 and so on...

I have multiple such columns whose sum in above manner is my sorting
expression.

How can I do in Solr ?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Complex-Sort-tp4267155.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Complex Sort

2016-03-31 Thread Emir Arnautovic

Hi,
Not sure if I fully understood your case, but here are some ideas:
- if you have small number of ids you can have score_%id% field that can 
be used for sorting
- if number of ids is large you can use sort by function to parse score 
data and find right score

- if number of results is small, calculate score after returning results

Not sure why you have multiple score fields? Can you sum it in advance 
and have one score per id? Or you have multiple ids per request?


Regards,
Emir

On 31.03.2016 10:07, ~$alpha` wrote:

I have a column in Mysql and I need to do a complex sorting in Solr.

Considering below column

|175#40|173#17|174#13|134#11|17#8|95#4|64#3|116#3|343#0|
where 175 indicates values to be matches and 40 implies score.

So if logged in user value is 175 he will give a score of 40 and if the
value is 173 he should get a score of 17 and so on...

I have multiple such columns whose sum in above manner is my sorting
expression.

How can I do in Solr ?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Complex-Sort-tp4267155.html
Sent from the Solr - User mailing list archive at Nabble.com.


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: issue with 5.3.1 and index version

2016-03-31 Thread Shalin Shekhar Mangar
The Lucene54 is the codec name. The luceneVer in solrconfig.xml is the
compatibility version which is used by some analyzers/tokenizers/token
filters to provide defaults or behaviour compatible with older versions. It
has no relation to the indexing codec being used and once an index has been
written to by a newer version of Lucene, going back to an old version is
not possible in most cases.

On Thu, Mar 31, 2016 at 4:00 AM, William Bell  wrote:

> When I index 5.4.1 using luceneVer in solrlconfig.xml of 5.3.1, the
> segmentsw_9 files has in it Lucene54. Why? Is this a known bug?
>
> #strings segments_9
>
> segments
>
> Lucene54
>
> commitTimeMSec
>
> 1459374733276
>
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 
Regards,
Shalin Shekhar Mangar.


Facet by truncated date

2016-03-31 Thread Robert Brown

Hi,

Is it possible to facet by a date (solr.TrieDateField) but truncated to 
the day, or even the hour?


If not, are there any other options apart from storing that truncated 
data in another (string?) field?


Thanks,
Rob



Re: Facet by truncated date

2016-03-31 Thread Emir Arnautovic

Hi Robert,
You can use range faceting and set use facet.range.gap to set how dates 
are "truncated".


Regards,
Emir

On 31.03.2016 10:52, Robert Brown wrote:

Hi,

Is it possible to facet by a date (solr.TrieDateField) but truncated 
to the day, or even the hour?


If not, are there any other options apart from storing that truncated 
data in another (string?) field?


Thanks,
Rob




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Facet by truncated date

2016-03-31 Thread Yago Riveiro
If you want aggregate the dat by the truncated date, I think the only way to
do it is using other field with the truncated date.

  

You can use a update request processor to calculate the truncated data
(https://wiki.apache.org/solr/UpdateRequestProcessor) or add the field in
indexing time.

  

date:"2016-03-31T12:00:0Z"

truncated_date_s:'2016-03-31' or truncated_date_i:20160331 (this should be
more memory efficient)

\--

  

/Yago Riveiro

  

![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/4708d221e9a24b519bab6
3936013ce59)

On Mar 31 2016, at 10:08 am, Emir Arnautovic
<emir.arnauto...@sematext.com> wrote:  

> Hi Robert,  
You can use range faceting and set use facet.range.gap to set how dates  
are "truncated".

>

> Regards,  
Emir

>

> On 31.03.2016 10:52, Robert Brown wrote:  
> Hi,  
>  
> Is it possible to facet by a date (solr.TrieDateField) but truncated  
> to the day, or even the hour?  
>  
> If not, are there any other options apart from storing that truncated  
> data in another (string?) field?  
>  
> Thanks,  
> Rob  
>  
>

>

> \--  
Monitoring * Alerting * Anomaly Detection * Centralized Log Management  
Solr & Elasticsearch Support * <http://sematext.com/>



Re: Complex Sort

2016-03-31 Thread ~$alpha`
I am not sure how to use "Sort By Function" for Case.

|10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0|

Can you tell how to fetch 40 when input is 10.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Complex-Sort-tp4267155p4267165.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet by truncated date

2016-03-31 Thread Emir Arnautovic

Hi Yago,
Not sure if I misunderstood the case, but assuming you have date field 
called my_date you can facet last 10 days by day using range queries:


?facet.range=my_date&facet.range.start=NOW/DAY-10DAYS&facet.range.end=NOW/DAY+1DAY&facet.range.gap=+1DAY

Regards,
Emir

On 31.03.2016 11:14, Yago Riveiro wrote:

If you want aggregate the dat by the truncated date, I think the only way to
do it is using other field with the truncated date.

   


You can use a update request processor to calculate the truncated data
(https://wiki.apache.org/solr/UpdateRequestProcessor) or add the field in
indexing time.

   


date:"2016-03-31T12:00:0Z"

truncated_date_s:'2016-03-31' or truncated_date_i:20160331 (this should be
more memory efficient)

\--

   


/Yago Riveiro

   


![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/4708d221e9a24b519bab6
3936013ce59)

On Mar 31 2016, at 10:08 am, Emir Arnautovic
<emir.arnauto...@sematext.com> wrote:


Hi Robert,

You can use range faceting and set use facet.range.gap to set how dates
are "truncated".


Regards,

Emir


On 31.03.2016 10:52, Robert Brown wrote:

> Hi,
>
> Is it possible to facet by a date (solr.TrieDateField) but truncated
> to the day, or even the hour?
>
> If not, are there any other options apart from storing that truncated
> data in another (string?) field?
>
> Thanks,
> Rob
>
>


\--

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * <http://sematext.com/>




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Facet by truncated date

2016-03-31 Thread Robert Brown

Hi Emir,

What if I don't want to specify a range?  Or would I have to do year 0 
to NOW?


Thanks,
Rob


On 03/31/2016 10:26 AM, Emir Arnautovic wrote:

Hi Yago,
Not sure if I misunderstood the case, but assuming you have date field 
called my_date you can facet last 10 days by day using range queries:


?facet.range=my_date&facet.range.start=NOW/DAY-10DAYS&facet.range.end=NOW/DAY+1DAY&facet.range.gap=+1DAY 



Regards,
Emir

On 31.03.2016 11:14, Yago Riveiro wrote:
If you want aggregate the dat by the truncated date, I think the only 
way to

do it is using other field with the truncated date.


You can use a update request processor to calculate the truncated data
(https://wiki.apache.org/solr/UpdateRequestProcessor) or add the 
field in

indexing time.


date:"2016-03-31T12:00:0Z"

truncated_date_s:'2016-03-31' or truncated_date_i:20160331 (this 
should be

more memory efficient)

\--


/Yago Riveiro


![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/4708d221e9a24b519bab6 


3936013ce59)

On Mar 31 2016, at 10:08 am, Emir Arnautovic
<emir.arnauto...@sematext.com> wrote:


Hi Robert,

You can use range faceting and set use facet.range.gap to set how dates
are "truncated".


Regards,

Emir


On 31.03.2016 10:52, Robert Brown wrote:

> Hi,
>
> Is it possible to facet by a date (solr.TrieDateField) but 
truncated

> to the day, or even the hour?
>
> If not, are there any other options apart from storing that 
truncated

> data in another (string?) field?
>
> Thanks,
> Rob
>
>


\--

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * <http://sematext.com/>








Re: Facet by truncated date

2016-03-31 Thread Yago Riveiro
Emir,

  

I assume that this query will create N ranges (one for each day) and give you
the counts, in this case it works indeed. I'm confess that never use facet
ranges before.

  

What output will give the range query? The result of the ranges or the dates
truncated with the counts?

  
\--

  

/Yago Riveiro

  

![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/eae9e3a3308049849ef01
3655c85f3ba)

On Mar 31 2016, at 10:26 am, Emir Arnautovic
<emir.arnauto...@sematext.com> wrote:  

> Hi Yago,  
Not sure if I misunderstood the case, but assuming you have date field  
called my_date you can facet last 10 days by day using range queries:

>

> ?facet.range=my_date&facet.range.start=NOW/DAY-
10DAYS&facet.range.end=NOW/DAY+1DAY&facet.range.gap=+1DAY

>

> Regards,  
Emir

>

> On 31.03.2016 11:14, Yago Riveiro wrote:  
> If you want aggregate the dat by the truncated date, I think the only way
to  
> do it is using other field with the truncated date.  
>  
>  
>  
> You can use a update request processor to calculate the truncated data  
> (https://wiki.apache.org/solr/UpdateRequestProcessor) or add the field in  
> indexing time.  
>  
>  
>  
> date:"2016-03-31T12:00:0Z"  
>  
> truncated_date_s:'2016-03-31' or truncated_date_i:20160331 (this should
be  
> more memory efficient)  
>  
> \\--  
>  
>  
>  
> /Yago Riveiro  
>  
>  
>  
>
![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/4708d221e9a24b519bab6  
> 3936013ce59)  
>  
> On Mar 31 2016, at 10:08 am, Emir Arnautovic  
> &lt;emir.arnauto...@sematext.com&gt; wrote:  
>  
>> Hi Robert,  
> You can use range faceting and set use facet.range.gap to set how dates  
> are "truncated".  
>  
>> Regards,  
> Emir  
>  
>> On 31.03.2016 10:52, Robert Brown wrote:  
> &gt; Hi,  
> &gt;  
> &gt; Is it possible to facet by a date (solr.TrieDateField) but
truncated  
> &gt; to the day, or even the hour?  
> &gt;  
> &gt; If not, are there any other options apart from storing that
truncated  
> &gt; data in another (string?) field?  
> &gt;  
> &gt; Thanks,  
> &gt; Rob  
> &gt;  
> &gt;  
>  
>> \\--  
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management  
> Solr &amp; Elasticsearch Support * <http://sematext.com/>;  
>  
>

>

> \--  
Monitoring * Alerting * Anomaly Detection * Centralized Log Management  
Solr & Elasticsearch Support * <http://sematext.com/>



Re: Complex Sort

2016-03-31 Thread Emir Arnautovic

You would have to write your custom function for that.

On 31.03.2016 11:24, ~$alpha` wrote:

I am not sure how to use "Sort By Function" for Case.

|10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0|

Can you tell how to fetch 40 when input is 10.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Complex-Sort-tp4267155p4267165.html
Sent from the Solr - User mailing list archive at Nabble.com.


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Facet by truncated date

2016-03-31 Thread Emir Arnautovic

Hi Rob,
Range is mandatory, and you should limit it since it will create too 
much buckets. I agree it would be great if it could use min/max values 
from query as start/end, but that is not how it works at the moment.


Regards,
Emir

On 31.03.2016 11:32, Robert Brown wrote:

Hi Emir,

What if I don't want to specify a range?  Or would I have to do year 0 
to NOW?


Thanks,
Rob


On 03/31/2016 10:26 AM, Emir Arnautovic wrote:

Hi Yago,
Not sure if I misunderstood the case, but assuming you have date 
field called my_date you can facet last 10 days by day using range 
queries:


?facet.range=my_date&facet.range.start=NOW/DAY-10DAYS&facet.range.end=NOW/DAY+1DAY&facet.range.gap=+1DAY 



Regards,
Emir

On 31.03.2016 11:14, Yago Riveiro wrote:
If you want aggregate the dat by the truncated date, I think the 
only way to

do it is using other field with the truncated date.


You can use a update request processor to calculate the truncated data
(https://wiki.apache.org/solr/UpdateRequestProcessor) or add the 
field in

indexing time.


date:"2016-03-31T12:00:0Z"

truncated_date_s:'2016-03-31' or truncated_date_i:20160331 (this 
should be

more memory efficient)

\--


/Yago Riveiro


![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/4708d221e9a24b519bab6 


3936013ce59)

On Mar 31 2016, at 10:08 am, Emir Arnautovic
<emir.arnauto...@sematext.com> wrote:


Hi Robert,

You can use range faceting and set use facet.range.gap to set how dates
are "truncated".


Regards,

Emir


On 31.03.2016 10:52, Robert Brown wrote:

> Hi,
>
> Is it possible to facet by a date (solr.TrieDateField) but 
truncated

> to the day, or even the hour?
>
> If not, are there any other options apart from storing that 
truncated

> data in another (string?) field?
>
> Thanks,
> Rob
>
>


\--

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * <http://sematext.com/>








--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re[2]: [possible bug]: [child] - ChildDocTransformerFactory returns top level documents nested under middle level documents when queried for the middle level ones

2016-03-31 Thread Alisa Z .
 Thanks, Anshum! 

This definitely brings the result I wanted. 

It is just the description from ChildDocTransformerFactory docs (" This 
transformer returns all descendants of each parent document in a flat list 
nested inside the parent document .") is a bit misleading... 

One should never stop experimenting :) 


>Среда, 30 марта 2016, 15:19 -04:00 от Anshum Gupta :
>
>I'm not the best person to comment on this so perhaps someone could chime
>in as well, but can you try using a wildcard for your childFilter?
>Something like: childFilter=type_s:doc.enriched.text.*
>
>You could also possibly enrich the document with depth information and use
>that for filtering out.
>
>On Wed, Mar 30, 2016 at 11:34 AM, Alisa Z. < prol...@mail.ru > wrote:
>
>>  I think I am observing an unexpected behavior of
>> ChildDocTransformerFactory.
>>
>> The query is like this:
>>
>> /select?q={!parent which= "type_s:doc.enriched.text "}t
>> ype_s:doc.enriched.text.entities  +text_t:pjm +type_t:Company
>> +relevance_tf:[0.7%20TO%20*]&fl=*,[child
>> parentFilter=type_s:doc.enriched.text  limit=1000]
>>
>> The levels of hierarchy are shown in the  type_s field.  So I am querying
>> on some descendants and returning some ancestors that are somewhere in the
>> middle of the hierarchy. I also want to get all the nested documents
>> below  that middle level.
>>
>> Here is the result:
>>
>> 
>> 
>>
>>  doc.enriched.text// this is the level
>> I wanted to get to and then go down from it
>>  ... 
>>  13565 
>> 
>>  doc.enriched   // This is a document
>> from 1 level up, the parent of the
>>// current  type_s :
>> doc.enriched.text document -- why is it here?
>>  22024 
>> 
>> 
>>  doc.original   // This is an "uncle"
>>  26698 
>> 
>> 
>>  doc// and this a
>> grandparent!!!
>>
>>
>> 
>>
>> And so on, bringing the whole tree up and down all under my middle-level
>> document.
>> I really hope this is not the expected behavior.
>>
>> I appreciate your help in advance.
>>
>> --
>> Alisa Zhila
>
>
>
>
>-- 
>Anshum Gupta



most popular collate spellcheck

2016-03-31 Thread michael solomon
Hi,
It's possible to return the most popular collate?
i.e:
spellcheck.q = prditive analytiycs
spellcheck.maxCollations = 5
spellcheck.count=0
response:

  
  false
  
positive analytic
positive analytics
predictive analytics
primitive analytics
punitive analytic
  


I want that the collations will order by numFound. and obviesly that
"predictive analytics" have more results from "positive analytic".
Thanks,
Michael


Re: Load Resource from within Solr Plugin

2016-03-31 Thread Max Bridgewater
Hi Folks,

Thanks for all the great suggestions. i will try and see which one works
best.
@Hoss: The WEB-INF folder is just in my dev environment. I have a localo
Solr instance and I points it to the target/WEB-INF. Simple convenient
setup for development purposes.

Much appreciated.

Max.

On Wed, Mar 30, 2016 at 4:24 PM, Rajesh Hazari 
wrote:

> Max,
> Have you looked in External file field which is reload on every hard
> commit,
> only disadvantage of this is the file (personal-words.txt) has to be placed
> in all data folders in each solr core,
> for which we have a bash script to do this job.
>
>
> https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes
>
> Ignore this if this does not meets your requirement.
>
> *Rajesh**.*
>
> On Wed, Mar 30, 2016 at 1:21 PM, Chris Hostetter  >
> wrote:
>
> > :
> > :  > : regex=".*\.jar" />
> >
> > 1) as a general rule, if you have a  delcaration which includes
> > "WEB-INF" you are probably doing something wrong.
> >
> > Maybe not in this case -- maybe "search-webapp/target" is a completley
> > distinct java application and you are just re-using it's jars.  But 9
> > times out of 10, when people have a  WEB-INF path they are trying to load
> > jars from, it's because they *first* added their jars to Solr's WEB_INF
> > directory, and then when that didn't work they added the path to the
> > WEB-INF dir as a  ... but now you've got those classes being loaded
> > twice, and you've multiplied all of your problems.
> >
> > 2) let's ignore the fact that your path has WEB-INF in it, and just
> > assume it's some path to somewhere where on disk that has nothing to
> > do with solr, and you want to load those jars.
> >
> > great -- solr will do that for you, and all of those classes will be
> > available to plugins.
> >
> > Now if you wnat to explicitly do something classloader related, you do
> > *not* want to be using Thread.currentThread().getContextClassLoader() ...
> > because the threads that execute everything in Solr are a pool of worker
> > threads that is created before solr ever has a chance to parse your  > /> directive.
> >
> > You want to ensure anything you do related to a Classloader uses the
> > ClassLoader Solr sets up for plugins -- that's available from the
> > SolrResourceLoader.
> >
> > You can always get the SolrResourceLoader via
> > SolrCore.getSolrResourceLoader().  from there you can getClassLoader() if
> > you really need some hairy custom stuff -- or if you are just trying to
> > load a simple resource file as an InputStream, use openResource(String
> > name) ... that will start by checking for it in the conf dir, and will
> > fallback to your jar -- so you can have a default resource file shipped
> > with your plugin, but allow users to override it in their collection
> > configs.
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
> >
>


index time boost nested documents JSON format

2016-03-31 Thread michael solomon
Hi,
how can I index time boost nested documents JSON format?


How to reference search term(s) in searchHandler

2016-03-31 Thread John Bickerstaff
I believe I want to set up a search handler with a function query to avoid
needing to code it.

The function query does some weighting by checking the "title" field for
whatever the user entered as their search term (named myCurrentSearchTerm
below)

To test this out in the Admin UI, I have the following under the edismax
"pop-out"

bf: product(query($titleQuery),1)

the "titleQuery" was set up in the Raw Query Parameters field thus:

titleQuery=title:(myCurrentSearchTerm)

I imagine there must be a way to duplicate this in a searchHandler inside
the solrconfig.xml file, but I'm not sure how the syntax would work.  In
particular, what is the magic incantation to reference what is coming in as
the user's search term?

If, for example, my URL contained [q=gastrointestinal] how can I reference
that inside the searchHandler XML?

I have something like this in mind - can anyone on the list tell me if this
is wrong and if so, what would be right?

I'm only guessing that *$q* might represent the incoming search terms...  I
can't find any examples online to guide me.



 
   explicit
   20
   
   edismax
   text
   * product(query(title:($q)),1)*
 


Is it even possible to directly reference the search terms inside a
searchHandler like this?

Do I have the wrong idea here and should be trying a different approach?


Thanks...


issues using BlendedInfixLookupFactory in solr5.5

2016-03-31 Thread xavi jmlucjav
Hi,

I have been working with
AnalyzingInfixLookupFactory/BlendedInfixLookupFactory in 5.5.0, and I have
a number of questions/comments, hopefully I get some insight into this:

- Doc not complete/up-to-date:
- blenderType param does not accept 'linear' value, it did in 5.3. I
commented it out as it's the default.
- it should be mentioned contextField must be a stored field
- if the field used is whitespace tokenized, and you search for 'one t',
the suggestions are sorted by weight, not score. So if you give a constant
score to all docs, you might get this:
1. one four two
2. one two four
  Would taking the score into account (something not done yet but could be
done according to something I saw in code/jira) return 2,1 instead of 1,2?
My guess is it would, correct?
- what would we need to return the score too? Could it be done easily?
along with the payload or something.
- would it be possible to make BlendedInfixLookupFactory allow for some
fuzziness a la FuzzyLookupFactory?
- when building a big suggester, it can take a long time, you just send a
request with suggest.build=true and wait. Is there any possible way to
monitor the progress of this? I did not find one.
- for weightExpression, one typical use case would be to provide the users'
lat/lon to weight the suggestions by proximity, is this somehow feasible?
What would be needed?
- does SolrCloud totally support suggesters? If so does each shard build
its own suggester and it works just like a normal distributed search ?
- I filled SOLR-8928 suggest.cfq does not work with
DocumentExpressionDictionaryFactory/weightExpression as I found that combo
not working.

regards
xavi


Re: Question about caching

2016-03-31 Thread Erick Erickson
A couple of quick things to add:

The queryResultCache was historically built for paging, it's quite common
for its utilization to be low, but... it's not a very expensive cache.
Basically, it's a map where each entry (up to "size") is a key that is the
query and value that is queryResultWindowSize (from solrconfig.xml)
internal lucene doc IDs (ints). The idea is if the user comes back asking
for the next page, and assuming there've been no commits, then the search
does not have to be re-executed. Bottom line is it's rare that tweaking
this makes much difference to the user, and it's pretty small anyway.

Your filterCache is where "fq" clauses (and sometimes faceting) is kept and
is quite small, although it's still getting a fair hit rate. Each entry is
again a map where the key is the fq clause and the value is a bitset
maxDocs long (i.e. maxDocs/8 bytes). And there can be up to "size" of them.

The screenshot where you see Solr using up all the available physical
memory is misleading, Lucene uses MMapDirectory under the covers and maps
all of the index bits it needs into virtual memory, see Uwe's excellent
blog:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html.

In general you want to allocate as little memory to the JVM as possible.
Unfortunately it's very hard to estimate that in the abstract, see:
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
so you have to experiment.

Best,
Erick

On Thu, Mar 31, 2016 at 6:53 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Bastien,
> There are several things I noticed on the first look:
> * it is recommended to run Solr on embedded Jetty
> * you have quite a lot deleted documents (deletes or document updates) and
> purging them can reduce index size resulting in more index fit in memory
> * your heap is way too big and it is recommended to use same value for min
> and max size. You should set heap under 32GB in order to be able to use
> compressed oops, but also to leave enough ram for file caches. Larger heap
> also suffers from longer pauses during major GC, so it should not be much
> larger than observed under worst case load.
> * It is expected heap to grow if there are spare space. What matters more
> is how system is behaving when heap is close to full. Do you run just Solr
> on this Tomcat or some client app is also running on it?
> * cache numbers are for system that is not used much so there can be
> different patterns:
> ** query cache indicates it is not likely that same query will be exacuted
> twice, and maybe there is no point in having query cache - really low hit
> ratio
> ** filter cache indicates filters are not used much - maybe there is
> nothing you can do about it, but you should check if there are query parts
> that can be moved to filters.
> ** there are no warmup queries - you should warmup your caches
>
> What you need to do is run Solr on Jetty, set up some monitoring tool, run
> tests and tune Solr heap and caches. One such tool for monitoring is our
> SPM (http://sematext.com/spm).
>
> HTH,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On 31.03.2016 15:25, Bastien Latard - MDPI AG wrote:
>
> Dear Solr experts :),
>
> I read this very interesting post 'Understanding and tuning your Solr
> caches '
> !
> This is the only good document that I was able to find after searching for
> 1 day!
>
> *I was using Solr for 2 years without knowing in details what it was
> caching...(because I did not need to understand it before).*
> *I had to take a look since I needed to restart (regularly) my tomcat in
> order to improve performances...*
>
> But I now have 2 questions:
> 1) *How can I know how much RAM is my solr using* * in real* (especially
> for caching)?
> 2) Could you have a quick look into the following images and tell me if
> I'm doing something wrong?
>
> Note: my index contains 66 millions of articles with several text fields
> stored.
>
>
> *My solr contains several cores (all together are ~80Gb big), but almost
> only the one below is used.*
>
> I have the feeling that a lot of data is always stored in RAM...and
> getting bigger and bigger all the time...
>
>
>
>
> (after restart)
> *$ sudo tail -f /var/log/tomcat7/catalina.out | grep GC*
>
> [...] after a few minutes
>
>
> Here are some images, that can show you some stats about my Solr
> performances...
>
>
>
>
>
>
> Kind regards,
> Bastien Latard
> Web engineer
> --
> MDPI AG
> Postfach, CH-4005 Basel, Switzerland
> Office: Klybeckstrasse 64, CH-4057
> Tel. +41 61 683 77 35
> Fax: +41 61 302 89 18
> E-mail: latard@mdpi.comhttp://www.mdpi.com/
>
>
>
>
>
>
>


Re: Question about caching

2016-03-31 Thread Shawn Heisey
On 3/31/2016 7:25 AM, Bastien Latard - MDPI AG wrote:
> I read this very interesting post 'Understanding and tuning your Solr
> caches
> ' !
> This is the only good document that I was able to find after searching
> for 1 day!
>
> /I was using Solr for 2 years without knowing in details what it was
> caching...(because I did not need to understand it before).//
> //I had to take a look since I needed to restart (regularly) my tomcat
> in order to improve performances.../
>
> But I now have 2 questions:
> 1) *How can I know how much RAM is my solr using**in real*(especially
> for caching)?
> 2) Could you have a quick look into the following images and tell me
> if I'm doing something wrong?

Getting a comprehensive breakdown of how Solr is using memory is NOT
straightforward, requiring special tools (some of which are included
with Java) and an understanding of how to use them.  It *IS* fairly easy
to find out how much TOTAL memory is in use.

Some low-level groundwork is being done that will hopefully make it easy
to include a memory usage breakdown in a future version of the admin
UI.  Don't quote me on this -- I'm guessing.  I would love to see that
functionality.

FYI -- your Solr caches aren't big enough to be huge memory consumers. 
Probably only a *maximum* of a couple hundred MB for your largest core,
but very likely less.

Your "htop" screenshot shows that Solr/Tomcat was consuming
approximately 5GB of real heap memory at the  moment you took the
screenshot -- the RES size minus the SHR size.  Looking at the VIRT
size, I can see that your total index size (all cores) is in the
neighborhood of 130GB.

In your third screenshot, I can see that your max heap is 40GB.  It is
highly unlikely that you will need a heap this large.  I'm betting that
the reason your performance gets better after a restart is that you're
freeing up a large amount of heap memory.  Over time, Solr/Tomcat will
use the entire 40GB that you've given it as the max heap, leaving you
only about 24GB of RAM to cache about 130GB of index data, and also most
likely causing some extremely long GC pauses.

I think that if you lower the max heap, you will no longer need to
restart.  Drop the max heap to 8GB and see if Solr will still work
properly.  If it doesn't, try 12GB, and then 16GB.  This wiki article
contains a small amount of info about choosing the proper heap size:

https://wiki.apache.org/solr/SolrPerformanceProblems#How_much_heap_space_do_I_need.3F

You should also look into GC tuning.  Upgrading to Solr 5.x would give
you GC tuning out of the box.  If you want to keep running your custom
Solr 4.x version in Tomcat, my personal wiki page has tuning info:

http://wiki.apache.org/solr/ShawnHeisey

Thanks,
Shawn



Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-03-31 Thread Girish Tavag
Hi,

I am new to solr, I started using this only from today,  when I wanted to
create dih, i'm getting the below error.

SolrException: fieldType 'booleans' not found in the schema

What does this mean? and How  to resolve this.

Regards,
GNT


Re: Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-03-31 Thread Binoy Dalal
Somewhere in your schema you've defined a field with type as "booleans".
You should check if you've made a typo somewhere by adding that extra s
after boolean.
Else if it is a separate field that you're looking to add, define a new
fieldtype called booleans.

All the info to help you with this can be found here:
https://cwiki.apache.org/confluence/display/solr/Documents,+Fields,+and+Schema+Design

I higly recommend that you go through the documentation before starting.

On Fri, 1 Apr 2016, 00:34 Girish Tavag,  wrote:

> Hi,
>
> I am new to solr, I started using this only from today,  when I wanted to
> create dih, i'm getting the below error.
>
> SolrException: fieldType 'booleans' not found in the schema
>
> What does this mean? and How  to resolve this.
>
> Regards,
> GNT
>
-- 
Regards,
Binoy Dalal


Re: Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-03-31 Thread Girish Tavag
Hi Binoy,

 I copied the entire file schema.xml from the working example provided by
solr itself. Solr provided dih example i'm able to run successfully .How
could this be a problem?

On Fri, Apr 1, 2016 at 12:39 AM, Binoy Dalal  wrote:

> Somewhere in your schema you've defined a field with type as "booleans".
> You should check if you've made a typo somewhere by adding that extra s
> after boolean.
> Else if it is a separate field that you're looking to add, define a new
> fieldtype called booleans.
>
> All the info to help you with this can be found here:
>
> https://cwiki.apache.org/confluence/display/solr/Documents,+Fields,+and+Schema+Design
>
> I higly recommend that you go through the documentation before starting.
>
> On Fri, 1 Apr 2016, 00:34 Girish Tavag,  wrote:
>
> > Hi,
> >
> > I am new to solr, I started using this only from today,  when I wanted to
> > create dih, i'm getting the below error.
> >
> > SolrException: fieldType 'booleans' not found in the schema
> >
> > What does this mean? and How  to resolve this.
> >
> > Regards,
> > GNT
> >
> --
> Regards,
> Binoy Dalal
>


Re: Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-03-31 Thread Shawn Heisey
On 3/31/2016 1:24 PM, Girish Tavag wrote:
>  I copied the entire file schema.xml from the working example provided by
> solr itself. Solr provided dih example i'm able to run successfully .How
> could this be a problem?

This info is exactly the same as what Binoy told you, except that I am
including example config info.

In your schema, you've got at least one field definition that looks like
this -- with the type attribute set to "booleans":

   

But you do NOT have a fieldType definition named "booleans" in your
schema, which would look something like this:



The example schemas included with Solr have a fieldType named
"boolean".  For this reason, if you were to simply replace "booleans"
with "boolean" in your schema, I think this problem will go away.

Note:  If you are running Solr in cloud mode, the active config will be
stored in the zookeeper database.  Editing the schema on the disk won't
be enough, you'll need to upload your changes to zookeeper.

Thanks,
Shawn



Re: Performance potential for updating (reindexing) documents

2016-03-31 Thread Shawn Heisey
On 3/24/2016 11:57 AM, tedsolr wrote:
> My post was scant on details. The numbers I gave for collection sizes are
> projections for the future. I am in the midst of an upgrade that will be
> completed within a few weeks. My concern is that I may not be able to
> produce the throughput necessary to index an entire collection quickly
> enough (3 to 4 hours) for a large customer (100M docs).

I can fully rebuild one of my indexes, with 146 million docs, in 8-10
hours.  This is fairly inefficient indexing -- six large shards (not
cloud), each one running the dataimport handler, importing from MySQL. 
I suspect I could probably get two or three times this rate (and maybe
more) on the same hardware if I wrote a SolrJ application that uses
multiple threads for each Solr shard.

I know from experiments that the MySQL server can push over 100 million
rows to a SolrJ program in less than an hour, including constructing
SolrInputDocument objects.  That experiment just left out the
"client.add(docs);" line.  The bottleneck is definitely Solr.

Each machine holds three large shards(half the index),is running Solr
4.x (5.x upgrade is in the works), and has 64GB RAM with an 8GB heap. 
Each shard is approximately 24.4 million docs and 28GB.  These machines
also hold another sharded index in the same Solr install, but it's quite
a lot smaller.

Thanks,
Shawn



make document with more matches rank higher with edismax parser?

2016-03-31 Thread Derek Poh

Hi

Correct me if I am wrong, my understanding of edismax parser is it use 
the max score of the matches in a doc.


How do I make docs with more matches rank higher with edismax?

These 2 docs are from the same query resultand this is their order in 
the result.


P_ProductId: 1116393488
P_CatConcatKeyword: Bancos del poder
P_NewShortDescription: Accione el banco, 10,400mAh, 5.0V DC entran
P_VeryShortDescription: Accione el banco

score: 0.83850163

P_ProductId: 1124048475
P_CatConcatKeyword: Bancos del poder
P_NewShortDescription: Banco del poder con el altavoz
P_VeryShortDescription: Banco del poder

score: 0.83850163

q=Bancos del poder
qf=P_CatConcatKeyword^3.0 P_NewShortDescription^2.0 
P_NewVeryShortDescription^1.0


From the debug info, both docs max score match is from 

P_CatConcatKeyword field. Debug info of both docsattached.
Comparing the field matches between both, the 2nd doc has more fields 
with matches. How can I make 2nd doc ranked higher based on this?


--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.1124048475

0.83850163 = (MATCH) sum of:
  0.004233816 = (MATCH) sum of:
0.0019395099 = (MATCH) max of:
  8.000289E-9 = (MATCH) weight(spp_keyword:banc^1.0E-5 in 6088628) 
[DefaultSimilarity], result of:
8.000289E-9 = score(doc=6088628,freq=1.0), product of:
  1.74163E-9 = queryWeight, product of:
1.0E-5 = boost
9.187129 = idf(docFreq=1868, maxDocs=6717914)
1.8957282E-5 = queryNorm
  4.5935645 = fieldWeight in 6088628, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
9.187129 = idf(docFreq=1868, maxDocs=6717914)
0.5 = fieldNorm(doc=6088628)
  5.8594847E-4 = (MATCH) weight(P_NewShortDescription:banco in 6088628) 
[DefaultSimilarity], result of:
5.8594847E-4 = score(doc=6088628,freq=1.0), product of:
  1.0539445E-4 = queryWeight, product of:
5.559576 = idf(docFreq=70312, maxDocs=6717914)
1.8957282E-5 = queryNorm
  5.559576 = fieldWeight in 6088628, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
5.559576 = idf(docFreq=70312, maxDocs=6717914)
1.0 = fieldNorm(doc=6088628)
  0.0012108017 = (MATCH) weight(P_VeryShortDescription:banco^2.0 in 
6088628) [DefaultSimilarity], result of:
0.0012108017 = score(doc=6088628,freq=1.0), product of:
  2.1425923E-4 = queryWeight, product of:
2.0 = boost
5.6511064 = idf(docFreq=64162, maxDocs=6717914)
1.8957282E-5 = queryNorm
  5.6511064 = fieldWeight in 6088628, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
5.6511064 = idf(docFreq=64162, maxDocs=6717914)
1.0 = fieldNorm(doc=6088628)
  0.0019395099 = (MATCH) weight(P_CatConcatKeyword:banco^3.0 in 6088628) 
[DefaultSimilarity], result of:
0.0019395099 = score(doc=6088628,freq=1.0), product of:
  3.3211973E-4 = queryWeight, product of:
3.0 = boost
5.8397913 = idf(docFreq=53129, maxDocs=6717914)
1.8957282E-5 = queryNorm
  5.8397913 = fieldWeight in 6088628, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
5.8397913 = idf(docFreq=53129, maxDocs=6717914)
1.0 = fieldNorm(doc=6088628)
4.8392292E-4 = (MATCH) max of:
  3.6249184E-9 = (MATCH) weight(spp_keyword:del^1.0E-5 in 6088628) 
[DefaultSimilarity], result of:
3.6249184E-9 = score(doc=6088628,freq=1.0), product of:
  1.1723361E-9 = queryWeight, product of:
1.0E-5 = boost
6.184094 = idf(docFreq=37653, maxDocs=6717914)
1.8957282E-5 = queryNorm
  3.092047 = fieldWeight in 6088628, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
6.184094 = idf(docFreq=37653, maxDocs=6717914)
0.5 = fieldNorm(doc=6088628)
  4.699589E-5 = (MATCH) weight(P_NewShortDescription:del in 6088628) 
[DefaultSimilarity], result of:
4.699589E-5 = score(doc=6088628,freq=1.0), product of:
  2.9848188E-5 = queryWeight, product of:
1.5744972 = idf(docFreq=3782103, maxDocs=6717914)
1.8957282E-5 = queryNorm
  1.5744972 = fieldWeight in 6088628, product of:
1.0 = tf(freq=1.0), w

Re: Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-03-31 Thread Jack Krupansky
Exactly which file did you copy? Please give the specific directory.

-- Jack Krupansky

On Thu, Mar 31, 2016 at 3:24 PM, Girish Tavag 
wrote:

> Hi Binoy,
>
>  I copied the entire file schema.xml from the working example provided by
> solr itself. Solr provided dih example i'm able to run successfully .How
> could this be a problem?
>
> On Fri, Apr 1, 2016 at 12:39 AM, Binoy Dalal 
> wrote:
>
> > Somewhere in your schema you've defined a field with type as "booleans".
> > You should check if you've made a typo somewhere by adding that extra s
> > after boolean.
> > Else if it is a separate field that you're looking to add, define a new
> > fieldtype called booleans.
> >
> > All the info to help you with this can be found here:
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Documents,+Fields,+and+Schema+Design
> >
> > I higly recommend that you go through the documentation before starting.
> >
> > On Fri, 1 Apr 2016, 00:34 Girish Tavag, 
> wrote:
> >
> > > Hi,
> > >
> > > I am new to solr, I started using this only from today,  when I wanted
> to
> > > create dih, i'm getting the below error.
> > >
> > > SolrException: fieldType 'booleans' not found in the schema
> > >
> > > What does this mean? and How  to resolve this.
> > >
> > > Regards,
> > > GNT
> > >
> > --
> > Regards,
> > Binoy Dalal
> >
>


update log not in ACTIVE or REPLAY state

2016-03-31 Thread michael dürr
Hallo,

when I launch my two nodes in Solr cloud, I always get the following error
at node2:

PeerSync: core=portal_shard1_replica2 url=http://127.0.1.1:8984/solr
ERROR,​ update log not in ACTIVE or REPLAY state.
FSUpdateLog{state=BUFFERING,​ tlog=null}

Actually, I cannot experience any problems, but before going to production,
I wanted to know why I get this error?

I'm running two nodes (node1 and node2) in a Solr Cloud cluster (5.4.1).
node1 is started with embedded zookeeper and listens to port 8983. Node2
listens on port 8984 and registers with the embedded zookeeper of node1 at
port 9983.
I have one collection "portal" (1 shard, 2 replicas), where each node
serves one replica.
The settings for commit on both nodes are:



  ${solr.autoCommit.maxTime:15000}
false

and


${solr.autoSoftCommit.maxTime:-1}

Can you give me some advise, how to get rid of this error?Should I
simply ignore it?

Thanks,Michael


How to create mappings between multiple Solr docs

2016-03-31 Thread vivekaltruist
I am using Solr 5.4 and have Solr schema containing 5 fields. Now, I am
indexing two types of docs:

Doc1:
field1
field2
field3
Doc2:
field1
field4
field5
Here field1 contains unique ID and it is common to both the docs. There are
more docs of type Doc2 than Doc1. Since field2 contains large amount of
text, I cannot duplicate field2 for each Doc2. Now I want to retrieve field2
also while querying for doc type Doc2 using field1. Is it possible to do so?

Stackoverflow Discussion link:
http://stackoverflow.com/questions/35742869/how-to-create-mappings-between-multiple-solr-docs



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-create-mappings-between-multiple-Solr-docs-tp4267418.html
Sent from the Solr - User mailing list archive at Nabble.com.