Re: Issue in 5.5.3 with lucene localParams with type

2016-11-28 Thread William Bell
Bump...

Thoughts?

It seems that {!lucene type=} would just override the lucene qp -
but in 5.5.3 something changed.

On Mon, Nov 28, 2016 at 1:15 PM, William Bell  wrote:

> In Solr 5.4.1 this used to work:
>
> fl={!lucene%20type=payloadQueryParser v='hosp_quality_spec_boost:PS628'}
>
> 24.227154,
>
> The only way I can get payloads to work is:
>
> fl={!payloadQueryParser v='hosp_quality_spec_boost:PS628'}
>
> 0.125,
>
> But the right values only come back in #2. It should be .125.
>
> Why is type not working anymore for queryParser?
>
> 
>  class="hg.payload.PayloadQParserPlugin"/>
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Starting SolrCloud

2016-11-28 Thread Erick Erickson
You need to find the solr.log file and examine it. What this usually
means is that something's wrong
with, say, your Solr configs. You should see a more informative
message in the Solr log, usually
it's a stack trace.

You say that your start "seems to complete successfully". That implies
that you were prompted for things like
how many Solr instances you wanted to start, a base configset, the
name of your collection and the like. Did
all that occur?


Best,
Erick



On Mon, Nov 28, 2016 at 8:07 PM, James Muerle  wrote:
> Hello,
>
> I am very new to Solr, and I'm excited to get it up and running on amazon
> ec2 for some prototypical testing. So, I've installed solr (and java) on
> one ec2 instance, and I've installed zookeeper on another. After starting
> the zookeeper server on the default port of 2181, I run this on the solr
> instance: "opt/solr/bin/solr start -c -z ".
> us-west-2.compute.amazonaws.com/solr"", which seems to complete
> successfully:
>
> Archiving 1 old GC log files to /opt/solr/server/logs/archived
> Archiving 1 console log files to /opt/solr/server/logs/archived
> Rotating solr logs, keeping a max of 9 generations
> Waiting up to 180 seconds to see Solr running on port 8983 [|]
> Started Solr server on port 8983 (pid=13038). Happy searching!
>
> But then when I run "/opt/solr/bin/solr status", I get this output:
>
> Found 1 Solr nodes:
>
> Solr process 13038 running on port 8983
>
> ERROR: Failed to get system information from http://localhost:8983/solr due
> to: org.apache.http.client.ClientProtocolException: Expected JSON response
> from server but received: 
> 
> 
> Error 500 Server Error
> 
> HTTP ERROR 500
> Problem accessing /solr/admin/info/system. Reason:
> Server ErrorCaused
> by:org.apache.solr.common.SolrException: Error processing the
> request. CoreContainer is either not initialized or shutting down.
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:518)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
> at java.lang.Thread.run(Thread.java:745)
> 
>
> 
> 
>
> Typically, this indicates a problem with the Solr server; check the Solr
> server logs for more information.
>
>
> I don't quite understand what things could be causing this problem, so I'm
> really at a loss at the moment. If you need any additional information, I'd
> be glad to provide it.
>
> Thanks for reading!
> James


Starting SolrCloud

2016-11-28 Thread James Muerle
Hello,

I am very new to Solr, and I'm excited to get it up and running on amazon
ec2 for some prototypical testing. So, I've installed solr (and java) on
one ec2 instance, and I've installed zookeeper on another. After starting
the zookeeper server on the default port of 2181, I run this on the solr
instance: "opt/solr/bin/solr start -c -z ".
us-west-2.compute.amazonaws.com/solr"", which seems to complete
successfully:

Archiving 1 old GC log files to /opt/solr/server/logs/archived
Archiving 1 console log files to /opt/solr/server/logs/archived
Rotating solr logs, keeping a max of 9 generations
Waiting up to 180 seconds to see Solr running on port 8983 [|]
Started Solr server on port 8983 (pid=13038). Happy searching!

But then when I run "/opt/solr/bin/solr status", I get this output:

Found 1 Solr nodes:

Solr process 13038 running on port 8983

ERROR: Failed to get system information from http://localhost:8983/solr due
to: org.apache.http.client.ClientProtocolException: Expected JSON response
from server but received: 


Error 500 Server Error

HTTP ERROR 500
Problem accessing /solr/admin/info/system. Reason:
Server ErrorCaused
by:org.apache.solr.common.SolrException: Error processing the
request. CoreContainer is either not initialized or shutting down.
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)





Typically, this indicates a problem with the Solr server; check the Solr
server logs for more information.


I don't quite understand what things could be causing this problem, so I'm
really at a loss at the moment. If you need any additional information, I'd
be glad to provide it.

Thanks for reading!
James


Re: Break up a supplier's documents (products) from dominating search result.

2016-11-28 Thread Alexandre Rafalovitch
You can use expand and it will provide several documents per group
(but in a different data structure in the response).

Then it is up to you how to sequence or interleave the results in your
UI. You do need to deal with edge-cases like what happens if you say 3
products per group, but then one group has only one and you don't have
enough items in a list, etc.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 29 November 2016 at 12:56, Derek Poh  wrote:
> Hi Walter
>
> You used field collapsing for your case as well?
>
> For my case the search result page is listing of products. There is a option
> to select the number of products to display per page.
> Let's say 40 products per page is selected. A search result has 100 matching
> products but these products belong to only 20 suppliers. The page will only
> display 20 products (1 product per supplier).
> We still need to fill up the remaining 20 empty products.
> How can I handle this scenario?
>
>
> On 11/29/2016 8:26 AM, Walter Underwood wrote:
>>
>> We had a similar feature in the Ultraseek search engine. One of our
>> customers
>> was a magazine publisher, and they wanted the best hit from each magazine
>> on the first page.
>>
>> I expect that field collapsing would work for this.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>>> On Nov 28, 2016, at 4:19 PM, Derek Poh  wrote:
>>>
>>> Alex
>>>
>>> Hope I understand what you meant by positive business requirements.
>>> With a few supplier's products dominating the first page of a search
>>> result, the sales will not be able to convince prospectiveor existing
>>> clients to sign up.
>>> They would like the results tofeature other supplier's products as well.
>>> To the extreme case, they were thinking of displaying the results tobe in
>>> such order
>>> Supplier A product
>>> Supplier B product
>>> Supplier C product
>>> Supplier A product
>>> Supplier B product
>>> Supplier C product
>>> ...
>>>
>>> Theyare alright with implementing this logic tothe first page only
>>> andsubsequent pages will be as per current logic if it is not possible to
>>> implement it to the entire search result.
>>>
>>> Will take a lookat Collapse and Expandto seeif it can help.
>>>
>>> On 11/28/2016 6:04 PM, Alexandre Rafalovitch wrote:

 You have described your _negative_ business requirements, but not the
 _positive_ ones. So, it is hard to see what they want to happen. It is
 easy enough to promote or demote a particular filter matches. But you
 want to partially limit them. On a first page? What about on the
 second?

 I suspect you would have to have a slightly different interface to do
 this effectively. And, most likely, using Collapse and Expand:

 https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
 .

 Regards,
 Alex.
 
 http://www.solr-start.com/ - Resources for Solr users, new and
 experienced


 On 28 November 2016 at 20:09, Derek Poh  wrote:
>
> Hi
>
> We have a business requirement to breakupa supplier's products from
> dominating search resultso as to allow othersuppliers' products in the
> search result to have exposure.
> Business users are open to implementing this for the first page of the
> search resultif it is not possible to apply tothe entire search result.
>
>  From the sample keywords users have provided, I also discovered
> thatmost of
> the time a supplier's products that are listed consecutively in the
> result
> all have the same score.
>
> Any advice/suggestions on how I cando it?
>
> Please let me know if more information is require. Thank you.
>
> Derek
>
> --
> CONFIDENTIALITY NOTICE
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and
> you
> must not use, disclose to anyone else or copy this e-mail (including
> any
> attachments), whether in whole or in part.
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.
>>>
>>>
>>> --
>>> CONFIDENTIALITY NOTICE
>>> This e-mail (including any attachments) may contain confidential and/or
>>> privileged information. If you are not the intended recipient or have
>>> received this e-mail in error, please inform the sender immediately and
>>> delete this e-mail (including any attachments) from your computer, and you
>>> must not use, disclose to anyone else or copy this 

Re: Break up a supplier's documents (products) from dominating search result.

2016-11-28 Thread Derek Poh

Is there a way where we do not have to change the page UI?

This is the search page for your reference.
http://www.globalsources.com/gsol/GeneralManager?hostname=www.globalsources.com_search=on=search%2FProductSearchResults_search=off==PRODUCT=en=new=denim+fabric=en_id=300149681_id=23844==t=N=ProdSearch=GetPoint=DoFreeTextSearch_search=on_search=off=grid 



On 11/29/2016 10:04 AM, Walter Underwood wrote:

We used something like field collapsing, but it wasn’t with Solr or Lucene.
They had not been invented at the time. This was a feature of the Ultraseek
engine from Infoseek, probably in 1997 or 1998.

With field collapsing, you provide a link to show more results from that source.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Nov 28, 2016, at 5:56 PM, Derek Poh  wrote:

Hi Walter

You used field collapsing for your case as well?

For my case the search result page is listing of products. There is a option to 
select the number of products to display per page.
Let's say 40 products per page is selected. A search result has 100 matching 
products but these products belong to only 20 suppliers. The page will only 
display 20 products (1 product per supplier).
We still need to fill up the remaining 20 empty products.
How can I handle this scenario?

On 11/29/2016 8:26 AM, Walter Underwood wrote:

We had a similar feature in the Ultraseek search engine. One of our customers
was a magazine publisher, and they wanted the best hit from each magazine
on the first page.

I expect that field collapsing would work for this.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Nov 28, 2016, at 4:19 PM, Derek Poh  wrote:

Alex

Hope I understand what you meant by positive business requirements.
With a few supplier's products dominating the first page of a search result, 
the sales will not be able to convince prospectiveor existing clients to sign 
up.
They would like the results tofeature other supplier's products as well.
To the extreme case, they were thinking of displaying the results tobe in such 
order
Supplier A product
Supplier B product
Supplier C product
Supplier A product
Supplier B product
Supplier C product
...

Theyare alright with implementing this logic tothe first page only 
andsubsequent pages will be as per current logic if it is not possible to 
implement it to the entire search result.

Will take a lookat Collapse and Expandto seeif it can help.

On 11/28/2016 6:04 PM, Alexandre Rafalovitch wrote:

You have described your _negative_ business requirements, but not the
_positive_ ones. So, it is hard to see what they want to happen. It is
easy enough to promote or demote a particular filter matches. But you
want to partially limit them. On a first page? What about on the
second?

I suspect you would have to have a slightly different interface to do
this effectively. And, most likely, using Collapse and Expand:
https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
.

Regards,
Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 28 November 2016 at 20:09, Derek Poh  wrote:

Hi

We have a business requirement to breakupa supplier's products from
dominating search resultso as to allow othersuppliers' products in the
search result to have exposure.
Business users are open to implementing this for the first page of the
search resultif it is not possible to apply tothe entire search result.

 From the sample keywords users have provided, I also discovered thatmost of
the time a supplier's products that are listed consecutively in the result
all have the same score.

Any advice/suggestions on how I cando it?

Please let me know if more information is require. Thank you.

Derek

--
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential and/or
privileged information. If you are not the intended recipient or have
received this e-mail in error, please inform the sender immediately and
delete this e-mail (including any attachments) from your computer, and you
must not use, disclose to anyone else or copy this e-mail (including any
attachments), whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal,
regulatory compliance and/or other appropriate reasons.

--
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential and/or 
privileged information. If you are not the intended recipient or have received 
this e-mail in error, please inform the sender immediately and delete this 
e-mail (including any attachments) from your computer, and you must not use, 
disclose to anyone else or copy this e-mail (including any attachments), 
whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal, 

Zookeeper connection lost in 5.5.3

2016-11-28 Thread Yago Riveiro
Hi, 

I upgraded my cluster to 5.5.3 and now I'm having a lot of this warnings.

Unable to read
/collections/collectionX/leader_initiated_recovery/shard9/core_node12 due
to: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/collections/collectionX/leader_initiated_recovery/shard9/core_node12

Also one node lost connection with zookeeper and was ejected from the
cluster.

Any clue about how I can debug this? 



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Zookeeper-connection-lost-in-5-5-3-tp4307804.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to enable JMX to monitor Jetty

2016-11-28 Thread Yago Riveiro
Hi,

Rallavagu, the jetty-jmx.xml file is the basic file of the github repository
or something custom?

I modified the file modules/http.mod and I can't see jetty stuff ...



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-enable-JMX-to-monitor-Jetty-tp4278246p4307802.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Break up a supplier's documents (products) from dominating search result.

2016-11-28 Thread Walter Underwood
We used something like field collapsing, but it wasn’t with Solr or Lucene.
They had not been invented at the time. This was a feature of the Ultraseek
engine from Infoseek, probably in 1997 or 1998.

With field collapsing, you provide a link to show more results from that source.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 28, 2016, at 5:56 PM, Derek Poh  wrote:
> 
> Hi Walter
> 
> You used field collapsing for your case as well?
> 
> For my case the search result page is listing of products. There is a option 
> to select the number of products to display per page.
> Let's say 40 products per page is selected. A search result has 100 matching 
> products but these products belong to only 20 suppliers. The page will only 
> display 20 products (1 product per supplier).
> We still need to fill up the remaining 20 empty products.
> How can I handle this scenario?
> 
> On 11/29/2016 8:26 AM, Walter Underwood wrote:
>> We had a similar feature in the Ultraseek search engine. One of our customers
>> was a magazine publisher, and they wanted the best hit from each magazine
>> on the first page.
>> 
>> I expect that field collapsing would work for this.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Nov 28, 2016, at 4:19 PM, Derek Poh  wrote:
>>> 
>>> Alex
>>> 
>>> Hope I understand what you meant by positive business requirements.
>>> With a few supplier's products dominating the first page of a search 
>>> result, the sales will not be able to convince prospectiveor existing 
>>> clients to sign up.
>>> They would like the results tofeature other supplier's products as well.
>>> To the extreme case, they were thinking of displaying the results tobe in 
>>> such order
>>> Supplier A product
>>> Supplier B product
>>> Supplier C product
>>> Supplier A product
>>> Supplier B product
>>> Supplier C product
>>> ...
>>> 
>>> Theyare alright with implementing this logic tothe first page only 
>>> andsubsequent pages will be as per current logic if it is not possible to 
>>> implement it to the entire search result.
>>> 
>>> Will take a lookat Collapse and Expandto seeif it can help.
>>> 
>>> On 11/28/2016 6:04 PM, Alexandre Rafalovitch wrote:
 You have described your _negative_ business requirements, but not the
 _positive_ ones. So, it is hard to see what they want to happen. It is
 easy enough to promote or demote a particular filter matches. But you
 want to partially limit them. On a first page? What about on the
 second?
 
 I suspect you would have to have a slightly different interface to do
 this effectively. And, most likely, using Collapse and Expand:
 https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
 .
 
 Regards,
Alex.
 
 http://www.solr-start.com/ - Resources for Solr users, new and experienced
 
 
 On 28 November 2016 at 20:09, Derek Poh  wrote:
> Hi
> 
> We have a business requirement to breakupa supplier's products from
> dominating search resultso as to allow othersuppliers' products in the
> search result to have exposure.
> Business users are open to implementing this for the first page of the
> search resultif it is not possible to apply tothe entire search result.
> 
> From the sample keywords users have provided, I also discovered thatmost 
> of
> the time a supplier's products that are listed consecutively in the result
> all have the same score.
> 
> Any advice/suggestions on how I cando it?
> 
> Please let me know if more information is require. Thank you.
> 
> Derek
> 
> --
> CONFIDENTIALITY NOTICE
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.
>>> 
>>> --
>>> CONFIDENTIALITY NOTICE
>>> This e-mail (including any attachments) may contain confidential and/or 
>>> privileged information. If you are not the intended recipient or have 
>>> received this e-mail in error, please inform the sender immediately and 
>>> delete this e-mail (including any attachments) from your computer, and you 
>>> must not use, disclose to anyone else or copy this e-mail (including any 
>>> attachments), whether in whole or in part.
>>> This e-mail and any reply to it 

Re: Break up a supplier's documents (products) from dominating search result.

2016-11-28 Thread Derek Poh

Hi Walter

You used field collapsing for your case as well?

For my case the search result page is listing of products. There is a 
option to select the number of products to display per page.
Let's say 40 products per page is selected. A search result has 100 
matching products but these products belong to only 20 suppliers. The 
page will only display 20 products (1 product per supplier).

We still need to fill up the remaining 20 empty products.
How can I handle this scenario?

On 11/29/2016 8:26 AM, Walter Underwood wrote:

We had a similar feature in the Ultraseek search engine. One of our customers
was a magazine publisher, and they wanted the best hit from each magazine
on the first page.

I expect that field collapsing would work for this.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Nov 28, 2016, at 4:19 PM, Derek Poh  wrote:

Alex

Hope I understand what you meant by positive business requirements.
With a few supplier's products dominating the first page of a search result, 
the sales will not be able to convince prospectiveor existing clients to sign 
up.
They would like the results tofeature other supplier's products as well.
To the extreme case, they were thinking of displaying the results tobe in such 
order
Supplier A product
Supplier B product
Supplier C product
Supplier A product
Supplier B product
Supplier C product
...

Theyare alright with implementing this logic tothe first page only 
andsubsequent pages will be as per current logic if it is not possible to 
implement it to the entire search result.

Will take a lookat Collapse and Expandto seeif it can help.

On 11/28/2016 6:04 PM, Alexandre Rafalovitch wrote:

You have described your _negative_ business requirements, but not the
_positive_ ones. So, it is hard to see what they want to happen. It is
easy enough to promote or demote a particular filter matches. But you
want to partially limit them. On a first page? What about on the
second?

I suspect you would have to have a slightly different interface to do
this effectively. And, most likely, using Collapse and Expand:
https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
.

Regards,
Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 28 November 2016 at 20:09, Derek Poh  wrote:

Hi

We have a business requirement to breakupa supplier's products from
dominating search resultso as to allow othersuppliers' products in the
search result to have exposure.
Business users are open to implementing this for the first page of the
search resultif it is not possible to apply tothe entire search result.

 From the sample keywords users have provided, I also discovered thatmost of
the time a supplier's products that are listed consecutively in the result
all have the same score.

Any advice/suggestions on how I cando it?

Please let me know if more information is require. Thank you.

Derek

--
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential and/or
privileged information. If you are not the intended recipient or have
received this e-mail in error, please inform the sender immediately and
delete this e-mail (including any attachments) from your computer, and you
must not use, disclose to anyone else or copy this e-mail (including any
attachments), whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal,
regulatory compliance and/or other appropriate reasons.


--
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential and/or 
privileged information. If you are not the intended recipient or have received 
this e-mail in error, please inform the sender immediately and delete this 
e-mail (including any attachments) from your computer, and you must not use, 
disclose to anyone else or copy this e-mail (including any attachments), 
whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.




--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.



Re: Solr 6.3.0 SQL question

2016-11-28 Thread Damien Kamerman
Aggregated selects only work with lower-case collection names (and no
dashes). (Bug in StatsStream I think)

I assume 'SOLR-9077 Streaming expressions should support collection alias'
which is fixed in 6.4 is a work around.

On 29 November 2016 at 08:29, Kevin Risden  wrote:

> Is there a longer error/stack trace in your Solr server logs? I wonder if
> the real error is being masked.
>
> Kevin Risden
>
> On Mon, Nov 28, 2016 at 3:24 PM, Joe Obernberger <
> joseph.obernber...@gmail.com> wrote:
>
> > I'm running this query:
> >
> > curl --data-urlencode 'stmt=SELECT avg(TextSize) from UNCLASS'
> > http://cordelia:9100/solr/UNCLASS/sql?aggregationMode=map_reduce
> >
> > The error that I get back is:
> >
> > {"result-set":{"docs":[
> > {"EXCEPTION":"org.apache.solr.common.SolrException: Collection not
> found:
> > unclass","EOF":true,"RESPONSE_TIME":2}]}}
> >
> > TextSize is defined as:
> >  > indexed="true" stored="true"/>
> >
> > This query works fine:
> > curl --data-urlencode 'stmt=SELECT TextSize from UNCLASS'
> > http://cordelia:9100/solr/UNCLASS/sql?aggregationMode=map_reduce
> >
> > Any idea what I'm doing wrong?
> > Thank you!
> >
> > -Joe
> >
> >
>


Re: upconfig in zookeeper doesn't relfect changes

2016-11-28 Thread Sadheera Vithanage
Please ignore this, It worked.

On Tue, Nov 29, 2016 at 11:41 AM, Sadheera Vithanage 
wrote:

> Hi All,
>
> I am trying to edit the solrconfig.xml for my solrcloud setup, which is in
> the zookeeper as a configuration.
>
> Below are the steps I am following.
>
> /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd downconfig -confdir
> /var/solr/data/dir_name -confname MyConfig -z 100.100.100.102
>
> Update the solrconfig.xml and save.
>
> Upload the config back to the zookeeper.
>
> /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -confdir
> /var/solr/data/dir_name -confname MyConfig -z 100.100.100.102
>
>
> It doesn't show any errors However, when I downconfig again I don't see my
> new changes.
>
> Any help is greatly appreciated.
>
>
> --
> Regards
>
> Sadheera Vithanage
>



-- 
Regards

Sadheera Vithanage


upconfig in zookeeper doesn't relfect changes

2016-11-28 Thread Sadheera Vithanage
Hi All,

I am trying to edit the solrconfig.xml for my solrcloud setup, which is in
the zookeeper as a configuration.

Below are the steps I am following.

/opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd downconfig -confdir
/var/solr/data/dir_name -confname MyConfig -z 100.100.100.102

Update the solrconfig.xml and save.

Upload the config back to the zookeeper.

/opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -confdir
/var/solr/data/dir_name -confname MyConfig -z 100.100.100.102


It doesn't show any errors However, when I downconfig again I don't see my
new changes.

Any help is greatly appreciated.


-- 
Regards

Sadheera Vithanage


Re: Break up a supplier's documents (products) from dominating search result.

2016-11-28 Thread Walter Underwood
We had a similar feature in the Ultraseek search engine. One of our customers
was a magazine publisher, and they wanted the best hit from each magazine 
on the first page.

I expect that field collapsing would work for this.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 28, 2016, at 4:19 PM, Derek Poh  wrote:
> 
> Alex
> 
> Hope I understand what you meant by positive business requirements.
> With a few supplier's products dominating the first page of a search result, 
> the sales will not be able to convince prospectiveor existing clients to sign 
> up.
> They would like the results tofeature other supplier's products as well.
> To the extreme case, they were thinking of displaying the results tobe in 
> such order
> Supplier A product
> Supplier B product
> Supplier C product
> Supplier A product
> Supplier B product
> Supplier C product
> ...
> 
> Theyare alright with implementing this logic tothe first page only 
> andsubsequent pages will be as per current logic if it is not possible to 
> implement it to the entire search result.
> 
> Will take a lookat Collapse and Expandto seeif it can help.
> 
> On 11/28/2016 6:04 PM, Alexandre Rafalovitch wrote:
>> You have described your _negative_ business requirements, but not the
>> _positive_ ones. So, it is hard to see what they want to happen. It is
>> easy enough to promote or demote a particular filter matches. But you
>> want to partially limit them. On a first page? What about on the
>> second?
>> 
>> I suspect you would have to have a slightly different interface to do
>> this effectively. And, most likely, using Collapse and Expand:
>> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
>> .
>> 
>> Regards,
>>Alex.
>> 
>> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>> 
>> 
>> On 28 November 2016 at 20:09, Derek Poh  wrote:
>>> Hi
>>> 
>>> We have a business requirement to breakupa supplier's products from
>>> dominating search resultso as to allow othersuppliers' products in the
>>> search result to have exposure.
>>> Business users are open to implementing this for the first page of the
>>> search resultif it is not possible to apply tothe entire search result.
>>> 
>>> From the sample keywords users have provided, I also discovered thatmost of
>>> the time a supplier's products that are listed consecutively in the result
>>> all have the same score.
>>> 
>>> Any advice/suggestions on how I cando it?
>>> 
>>> Please let me know if more information is require. Thank you.
>>> 
>>> Derek
>>> 
>>> --
>>> CONFIDENTIALITY NOTICE
>>> This e-mail (including any attachments) may contain confidential and/or
>>> privileged information. If you are not the intended recipient or have
>>> received this e-mail in error, please inform the sender immediately and
>>> delete this e-mail (including any attachments) from your computer, and you
>>> must not use, disclose to anyone else or copy this e-mail (including any
>>> attachments), whether in whole or in part.
>>> This e-mail and any reply to it may be monitored for security, legal,
>>> regulatory compliance and/or other appropriate reasons.
>> 
> 
> 
> --
> CONFIDENTIALITY NOTICE 
> This e-mail (including any attachments) may contain confidential and/or 
> privileged information. If you are not the intended recipient or have 
> received this e-mail in error, please inform the sender immediately and 
> delete this e-mail (including any attachments) from your computer, and you 
> must not use, disclose to anyone else or copy this e-mail (including any 
> attachments), whether in whole or in part. 
> This e-mail and any reply to it may be monitored for security, legal, 
> regulatory compliance and/or other appropriate reasons.



Re: Break up a supplier's documents (products) from dominating search result.

2016-11-28 Thread Derek Poh

Alex

Hope I understand what you meant by positive business requirements.
With a few supplier's products dominating the first page of a search 
result, the sales will not be able to convince prospectiveor existing 
clients to sign up.

They would like the results tofeature other supplier's products as well.
To the extreme case, they were thinking of displaying the results tobe 
in such order

Supplier A product
Supplier B product
Supplier C product
Supplier A product
Supplier B product
Supplier C product
...

Theyare alright with implementing this logic tothe first page only 
andsubsequent pages will be as per current logic if it is not possible 
to implement it to the entire search result.


Will take a lookat Collapse and Expandto seeif it can help.

On 11/28/2016 6:04 PM, Alexandre Rafalovitch wrote:

You have described your _negative_ business requirements, but not the
_positive_ ones. So, it is hard to see what they want to happen. It is
easy enough to promote or demote a particular filter matches. But you
want to partially limit them. On a first page? What about on the
second?

I suspect you would have to have a slightly different interface to do
this effectively. And, most likely, using Collapse and Expand:
https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
.

Regards,
Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 28 November 2016 at 20:09, Derek Poh  wrote:

Hi

We have a business requirement to breakupa supplier's products from
dominating search resultso as to allow othersuppliers' products in the
search result to have exposure.
Business users are open to implementing this for the first page of the
search resultif it is not possible to apply tothe entire search result.

 From the sample keywords users have provided, I also discovered thatmost of
the time a supplier's products that are listed consecutively in the result
all have the same score.

Any advice/suggestions on how I cando it?

Please let me know if more information is require. Thank you.

Derek

--
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential and/or
privileged information. If you are not the intended recipient or have
received this e-mail in error, please inform the sender immediately and
delete this e-mail (including any attachments) from your computer, and you
must not use, disclose to anyone else or copy this e-mail (including any
attachments), whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal,
regulatory compliance and/or other appropriate reasons.





--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: Index time sorting and per index mergePolicyFactory

2016-11-28 Thread Erick Erickson
Wait, on the page you referenced there's this which appears to be
exactly what you want:


  timestamp desc
  inner
  org.apache.solr.index.TieredMergePolicyFactory
  10
  10


And since this is in solrconfig.xml which is defined per core you can
specify whatever
you want for each core.

Also see SOLR-5730 and SOLR-8621

Best,
Erick

On Mon, Nov 28, 2016 at 1:47 PM, Dorian Hoxha  wrote:
> bump after 11 days
>
> On Thu, Nov 17, 2016 at 10:25 AM, Dorian Hoxha 
> wrote:
>
>> Hi,
>>
>> I know this is done in lucene, but I don't see it in solr (by searching +
>> docs on collections).
>>
>> I see https://cwiki.apache.org/confluence/display/solr/
>> IndexConfig+in+SolrConfig but it's not mentioned for index-time-sorting.
>>
>> So, is it possible and definable for each index ? I want to have some
>> sorted by 'x' field, some by 'y' field, and some staying as default.
>>
>> Thank You
>>


Re: Index time sorting and per index mergePolicyFactory

2016-11-28 Thread Dorian Hoxha
bump after 11 days

On Thu, Nov 17, 2016 at 10:25 AM, Dorian Hoxha 
wrote:

> Hi,
>
> I know this is done in lucene, but I don't see it in solr (by searching +
> docs on collections).
>
> I see https://cwiki.apache.org/confluence/display/solr/
> IndexConfig+in+SolrConfig but it's not mentioned for index-time-sorting.
>
> So, is it possible and definable for each index ? I want to have some
> sorted by 'x' field, some by 'y' field, and some staying as default.
>
> Thank You
>


Re: stream, features and train

2016-11-28 Thread Joe Obernberger
Thank you Joel - that was it; or rather a miss-understanding of how this 
works on my end!


-Joe


On 11/26/2016 10:17 PM, Joel Bernstein wrote:

Hi,

It looks like the outcome field my not be correct or it may have missing
values. You'll need to populate this field for all records in the training
set.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Nov 23, 2016 at 3:21 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:


Hi - I'm trying to experiment with the new train, features, model,
classify capabilities of Solr 6.3.0.  I'm following along on:
https://cwiki.apache.org/confluence/display/solr/Streaming+
Expressions#StreamingExpressions-StreamSources

When I execute:
features(UNCLASS,
 q="*:*",
 featureSet="JoeFeature1",
 field="Title",
 outcome="Out",
 numTerms=250)

Title is defined like:


Is this the correct syntax?  I'm getting an error:

{
   "result-set": {
 "docs": [
   {
 "EXCEPTION": "java.util.concurrent.ExecutionException:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://cressida:9100/solr/UNCLASS_shard2_replica2:
java.lang.NullPointerException\n\tat org.apache.solr.search.IGainTe
rmsQParserPlugin$IGainTermsCollector.collect(IGainTermsQParserPlugin.java:129)\n\tat
org.apache.lucene.search.MatchAllDocsQuery$1$1.score(MatchAllDocsQuery.java:56)\n\tat
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)\n\tat
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:669)\n\tat
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:473)\n\tat
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollecto
rChain(SolrIndexSearcher.java:242)\n\tat org.apache.solr.search.SolrInd
exSearcher.getDocListNC(SolrIndexSearcher.java:1803)\n\tat
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1620)\n\tat
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:617)\n\tat
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:531)\n\tat
org.apache.solr.handler.component.SearchHandler.handleReques
tBody(SearchHandler.java:295)\n\tat org.apache.solr.handler.Reques
tHandlerBase.handleRequest(RequestHandlerBase.java:153)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2213)\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:303)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
r(ServletHandler.java:1668)\n\tat org.eclipse.jetty.servlet.Serv
letHandler.doHandle(ServletHandler.java:581)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(
ContextHandler.java:1160)\n\tat org.eclipse.jetty.servlet.Serv
letHandler.doScope(ServletHandler.java:511)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(
ContextHandler.java:1092)\n\tat org.eclipse.jetty.server.handl
er.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
ndle(ContextHandlerCollection.java:213)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(
HandlerCollection.java:119)\n\tat org.eclipse.jetty.server.handl
er.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.
succeeded(AbstractConnection.java:273)\n\tat org.eclipse.jetty.io
.FillInterest.fillable(FillInterest.java:95)\n\tat org.eclipse.jetty.io
.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
.run(ExecuteProduceConsume.java:156)\n\tat org.eclipse.jetty.util.thread.
QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
java.lang.Thread.run(Thread.java:745)\n",
 "EOF": true,
 "RESPONSE_TIME": 10
   }
 ]
   }
}

Thank you!

-Joe






Re: Solr 6.3.0 SQL question

2016-11-28 Thread Kevin Risden
Is there a longer error/stack trace in your Solr server logs? I wonder if
the real error is being masked.

Kevin Risden

On Mon, Nov 28, 2016 at 3:24 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> I'm running this query:
>
> curl --data-urlencode 'stmt=SELECT avg(TextSize) from UNCLASS'
> http://cordelia:9100/solr/UNCLASS/sql?aggregationMode=map_reduce
>
> The error that I get back is:
>
> {"result-set":{"docs":[
> {"EXCEPTION":"org.apache.solr.common.SolrException: Collection not found:
> unclass","EOF":true,"RESPONSE_TIME":2}]}}
>
> TextSize is defined as:
>  indexed="true" stored="true"/>
>
> This query works fine:
> curl --data-urlencode 'stmt=SELECT TextSize from UNCLASS'
> http://cordelia:9100/solr/UNCLASS/sql?aggregationMode=map_reduce
>
> Any idea what I'm doing wrong?
> Thank you!
>
> -Joe
>
>


Solr 6.3.0 SQL question

2016-11-28 Thread Joe Obernberger

I'm running this query:

curl --data-urlencode 'stmt=SELECT avg(TextSize) from UNCLASS' 
http://cordelia:9100/solr/UNCLASS/sql?aggregationMode=map_reduce


The error that I get back is:

{"result-set":{"docs":[
{"EXCEPTION":"org.apache.solr.common.SolrException: Collection not 
found: unclass","EOF":true,"RESPONSE_TIME":2}]}}


TextSize is defined as:
multiValued="false" indexed="true" stored="true"/>


This query works fine:
curl --data-urlencode 'stmt=SELECT TextSize from UNCLASS' 
http://cordelia:9100/solr/UNCLASS/sql?aggregationMode=map_reduce


Any idea what I'm doing wrong?
Thank you!

-Joe



Re: ClassicIndexSchemaFactory with Solr 6.3

2016-11-28 Thread Cassandra Targett
I'm not seeing how the documentation is wrong here. It says:

"When a  is not explicitly declared in a
solrconfig.xml file, Solr implicitly uses a
ManagedIndexSchemaFactory"

IOW, managed schema is the default, and you may not find a
schemaFactory definition in your file. When a schemaFactory is not
defined, it is by default ManagedIndexSchemaFactory (see also
https://issues.apache.org/jira/browse/SOLR-8131).

The page then goes on to explain how to enable
ClassicIndexSchemaFactory if you choose. Take a look at the last
section, "Changing from Managed Schema to Manually Edited schema.xml".

Cassandra

On Sat, Nov 26, 2016 at 12:11 PM, Shawn Heisey  wrote:
> On 11/26/2016 10:58 AM, Furkan KAMACI wrote:
>> I'm trying Solr 6.3. I don't want to use Managed Schema. It was OK for
>> Solr 5.x. However solrconfig.xml of Solr 6.3 doesn't have a
>> ManagedIndexSchemaFactory definition. Documentation is wrong at this
>> point (
>> https://cwiki.apache.org/confluence/display/solr/Schema+Factory+Definition+in+SolrConfig
>> ) How can I use ClassicIndexSchemaFactory with Solr 6.3?
>
> I believe that the managed schema is default now if you don't specify
> the factory to use.  I checked basic_configs in 6.2.1 and that
> definition did not appear to be present.  You'll probably have to *add*
> the schema factory definition to the config.  It looks like it's a
> top-level element, under .  It's only one line.
>
> Thanks,
> Shawn
>


Issue in 5.5.3 with lucene localParams with type

2016-11-28 Thread William Bell
In Solr 5.4.1 this used to work:

fl={!lucene%20type=payloadQueryParser v='hosp_quality_spec_boost:PS628'}

24.227154,

The only way I can get payloads to work is:

fl={!payloadQueryParser v='hosp_quality_spec_boost:PS628'}

0.125,

But the right values only come back in #2. It should be .125.

Why is type not working anymore for queryParser?





-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Scheduling Data Import Handler (DIH) Delta Imports

2016-11-28 Thread Walter Underwood
First, try to do it with something like Apache Camel. That moves the whole
database import process outside of Solr where it can be more easily controlled.

http://camel.apache.org/ 

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 28, 2016, at 12:03 PM, Jamie Jackson  wrote:
> 
> One last bump before I get crackin'...
> 
> On Mon, Nov 21, 2016 at 11:54 AM, Jamie Jackson 
> wrote:
> 
>> Hi Folks,
>> 
>> I have DIH cores that are being indexed by my Lucee application. That
>> works, but I'd like to make some improvements:
>> 
>> 
>>   - Make a standalone scheduler that's not part of a larger application.
>>   (FYI, I want to Dockerize the import-triggering service.)
>>   - Prevent import requests from stacking up. Some of my cores' delta
>>   imports run every 15 seconds, and they do so blindly/ignorantly. If there
>>   is contention, very occasionally, import jobs will run long and stack up,
>>   so I want to make the scheduler nicer/more intelligent. Maybe the service
>>   would check the import status to see if there's a job already running
>>   before requesting a new one.
>> 
>> I can write such a thing myself, but does anybody have a Linux or
>> cross-platform solution written already?
>> 
>> Thanks,
>> Jamie
>> 



Re: Scheduling Data Import Handler (DIH) Delta Imports

2016-11-28 Thread Jamie Jackson
One last bump before I get crackin'...

On Mon, Nov 21, 2016 at 11:54 AM, Jamie Jackson 
wrote:

> Hi Folks,
>
> I have DIH cores that are being indexed by my Lucee application. That
> works, but I'd like to make some improvements:
>
>
>- Make a standalone scheduler that's not part of a larger application.
>(FYI, I want to Dockerize the import-triggering service.)
>- Prevent import requests from stacking up. Some of my cores' delta
>imports run every 15 seconds, and they do so blindly/ignorantly. If there
>is contention, very occasionally, import jobs will run long and stack up,
>so I want to make the scheduler nicer/more intelligent. Maybe the service
>would check the import status to see if there's a job already running
>before requesting a new one.
>
> I can write such a thing myself, but does anybody have a Linux or
> cross-platform solution written already?
>
> Thanks,
> Jamie
>


Re: Search opening hours

2016-11-28 Thread David Smiley
Lets say you wanted to do ranges over some integer.  Simply convert those
integers to dates, such as
java.time.Instant.ofEpochSecond(myInteger).toString().  It's more efficient
to convert to seconds (as in this example) as a base instead milliseconds
because the internal date oriented tree has 1000 leaves at the millisecond
level to aggregate to the next higher (second).  Also keep in mind you have
to work within a signed Long space.

Longer term, hopefully someone will add a Solr adapter to Lucene's
new IntRangeField (and *RangeField variants) which is for general use.  I'm
not sure if LongRangeField would be faster than DateRangeField as the
approaches are internally quite different.  It probably would be.  The
other factor is index size, and I think those new range fields would
generally be leaner.

~ David

On Fri, Nov 25, 2016 at 4:18 PM O. Klein  wrote:

> Thank you for your reply David.
>
> Yes, I ended up using a DateRangeField. Down side is that it needs frequent
> updates. Luckily not an issue for my use case.
>
> BTW how could I abuse DateRangeField for non-date data?
>
>
>
>
> david.w.smi...@gmail.com wrote
> > I just saw this conversation now.  I didn't read every word but I have to
> > ask immediately: does DateRangeField address your needs?
> > https://cwiki.apache.org/confluence/display/solr/Working+with+Dates  It
> > was
> > introduced in 5.0.
> >
> > On Wed, Nov 16, 2016 at 4:59 AM O. Klein 
>
> > klein@
>
> >  wrote:
> >
> >> Above implementation was too slow, so wondering if Solr 6 with all its
> >> new
> >> features provides a better solution to tackle operating hours.
> Especially
> >> dealing with different timezones.
> >>
> >> Any thoughts?
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Search-opening-hours-tp4225250p4306073.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> > --
> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > http://www.solrenterprisesearchserver.com
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Search-opening-hours-tp4225250p4307463.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


Re: Failure when trying to full sync, out of space ? Doesn't delete old segments before full sync?

2016-11-28 Thread Walter Underwood
> On Nov 28, 2016, at 9:38 AM, Shawn Heisey  wrote:
> 
> […] Typically
> a merge or optimize will only require double the space, but there are
> certain worst-case scenarios where it can require triple.  I do not know
> what causes the worst-case situation.

Worst case:

1. Disable merging.
2. Delete all the documents.
3. Add all the documents.
4. Enable merging.

After step 3, you have two copies of everything, one deleted copy and one new 
copy.
The merge makes a third copy.

This can happen with any search engine that uses the same kind of segment
management as Lucene. We learned about this with Ultraseek, back in the
mid-1990’s.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)




Re: Failure when trying to full sync, out of space ? Doesn't delete old segments before full sync?

2016-11-28 Thread Michael Joyner



On 11/28/2016 12:26 PM, Erick Erickson wrote:

Well, such checks could be put in, but they don't get past the basic problem.



And all this masks your real problem; you didn't have enough disk
space to optimize in the first place. Even during regular indexing w/o
optimizing, Lucene segment merging can always theoretically merge all
your segments at once. Therefore you always need at _least_ as much
free space on your disks as all your indexes occupy to be sure you
won't hit a disk-full problem. The rest would be band-aids. Although I
suppose refusing to even start if there wasn't enough free disk space
isn't a bad idea, it's not foolproof though


If such a "warning" feature is added, it's not "foolproof" that I would 
expect, and I wouldn't expect it be able to "predict" usage caused by 
events happening after it passes a basic initial check. I am just 
thinking a basic up-front check that indicates "it just ain't happening" 
might be useful.


So.. how does one handle needing all this "free space" between major 
index updates when one gets charged by the GB for allocated space 
without regard to actual storage usage?






Re: Failure when trying to full sync, out of space ? Doesn't delete old segments before full sync?

2016-11-28 Thread Shawn Heisey
On 11/28/2016 9:39 AM, Michael Joyner wrote:
> I'm running out of spacing when trying to restart nodes to get a
> cluster back up fully operational where a node ran out of space during
> an optimize.
>
> It appears to be trying to do a full sync from another node, but
> doesn't take care to check available space before starting downloads
> and doesn't delete the out of date segment files before attempting to
> do the full sync.

If you've run out of space during an optimize, then your Solr install
doesn't have enough disk space for proper operation.  The recommendation
is to have enough disk space to store all your index data three times --
free space should be double the size of all your index data.  Typically
a merge or optimize will only require double the space, but there are
certain worst-case scenarios where it can require triple.  I do not know
what causes the worst-case situation.  This is a Lucene requirement, and
Solr is based on Lucene.

The replication feature, which is how SolrCloud accomplishes index
recovery, assumes that the existing index must remain online until the
new index is fully transferred and available, at which time it will
become the live index, and the previous one can be deleted.  This
feature existed long before SolrCloud did.  Standalone mode will not be
disappearing anytime soon, so this assumption must remain.  Writing code
to decide when the existing index doesn't need to be kept would be
somewhat difficult and potentially very fragile.  This doesn't mean we
won't do it, but I think that's why it hasn't already been done.

Also, we still have that general disk space recommendation already
mentioned.  If that recommendation is followed, you're not going to run
out of disk space due to index recovery.

> It seems to know what size the segments are before they are
> transferred, is there a reason a basic disk space check isn't done for
> the target partition with an immediate abort done if the destination's
> space looks like it would go negative before attempting sync? Is this
> something that can be enabled in the master solrconfig.xml file? This
> would be a lot more useful (IMHO) than waiting for a full sync to
> complete only to run out of space after several hundred gigs of data
> is transferred with automatic cluster recovery failing as a result.

Remembering that the replication feature is NOT limited to use by
SolrCloud ... this is not a bad idea.  Because the replication handler
knows what files must be transferred before an index fetch takes place,
it can calculate how much disk space is required, and could return an
error response and ignore the command.  The way that SolrCloud uses
replication may not work with this, though.  SolrCloud replication may
work differently than the automated replication that can be set up in
standalone mode.  I am not sure whether it handles individual files, or
simply requests an index fetch.

But, at the risk of repeating myself ... running with so little free
disk space is not recommended.  The entire problem is avoided by
following recommendations.

Thanks,
Shawn



Re: Failure when trying to full sync, out of space ? Doesn't delete old segments before full sync?

2016-11-28 Thread Michael Joyner
We've being trying to run at 40% estimated usage when optimized, but are 
doing a large amount of index updates ... 40% usage in this scenario 
seems to be too high...



On 11/28/2016 12:26 PM, Erick Erickson wrote:

Well, such checks could be put in, but they don't get past the basic problem.

bq: If the segments are out of date and we are pulling from another
node before coming "online" why aren't the old segments deleted?

because you run the risk of losing _all_ your data and having nothing
at all. The process is
1> pull all the segments down
2> rewrite the segments file

Until <2>, you can still use your old index. Also consider a full
synch in master/slave mode. I optimize on the master and Solr then
detects that it'll be a full sync anddeletes the entire active
index.

bq: Is this something that can be enabled in the master solrconfig.xml file?
no

bq: ...is there a reason a basic disk space check isn't done 
That would not be very robust. Consider the index is 1G and I have
1.5G of free space. Now replication makes the check and starts.
However, during that time segments are merged consuming .75G. Boom,
disk full again.

Additionally, any checks would be per core. What if 10 cores start
replication as above at once? Which would absolutely happen if you
have 10 replicas for the same shard in one JVM...

And all this masks your real problem; you didn't have enough disk
space to optimize in the first place. Even during regular indexing w/o
optimizing, Lucene segment merging can always theoretically merge all
your segments at once. Therefore you always need at _least_ as much
free space on your disks as all your indexes occupy to be sure you
won't hit a disk-full problem. The rest would be band-aids. Although I
suppose refusing to even start if there wasn't enough free disk space
isn't a bad idea, it's not foolproof though

Best,
Erick


On Mon, Nov 28, 2016 at 8:39 AM, Michael Joyner  wrote:

Hello all,

I'm running out of spacing when trying to restart nodes to get a cluster
back up fully operational where a node ran out of space during an optimize.

It appears to be trying to do a full sync from another node, but doesn't
take care to check available space before starting downloads and doesn't
delete the out of date segment files before attempting to do the full sync.

If the segments are out of date and we are pulling from another node before
coming "online" why aren't the old segments deleted? Is this something that
can be enabled in the master solrconfig.xml file?

It seems to know what size the segments are before they are transferred, is
there a reason a basic disk space check isn't done for the target partition
with an immediate abort done if the destination's space looks like it would
go negative before attempting sync? Is this something that can be enabled in
the master solrconfig.xml file? This would be a lot more useful (IMHO) than
waiting for a full sync to complete only to run out of space after several
hundred gigs of data is transferred with automatic cluster recovery failing
as a result.

This happens when doing a 'sudo service solr restart'

(Workaround, shutdown offending node, manually delete segment index folders
and tlog files, start node)

Exception:

WARN  - 2016-11-28 16:15:16.291;
org.apache.solr.handler.IndexFetcher$FileFetcher; Error in fetching file:
_2f6i.cfs (downloaded 2317352960 of 5257809205 bytes)
java.io.IOException: No space left on device
 at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
 at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
 at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
 at sun.nio.ch.IOUtil.write(IOUtil.java:65)
 at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
 at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
 at java.nio.channels.Channels.writeFully(Channels.java:101)
 at java.nio.channels.Channels.access$000(Channels.java:61)
 at java.nio.channels.Channels$1.write(Channels.java:174)
 at
org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:419)
 at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
 at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
 at
org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
 at
org.apache.solr.handler.IndexFetcher$DirectoryFile.write(IndexFetcher.java:1634)
 at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1491)
 at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1429)
 at
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:855)
 at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:434)
 at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251)
 at

Re: Failure when trying to full sync, out of space ? Doesn't delete old segments before full sync?

2016-11-28 Thread Erick Erickson
Well, such checks could be put in, but they don't get past the basic problem.

bq: If the segments are out of date and we are pulling from another
node before coming "online" why aren't the old segments deleted?

because you run the risk of losing _all_ your data and having nothing
at all. The process is
1> pull all the segments down
2> rewrite the segments file

Until <2>, you can still use your old index. Also consider a full
synch in master/slave mode. I optimize on the master and Solr then
detects that it'll be a full sync anddeletes the entire active
index.

bq: Is this something that can be enabled in the master solrconfig.xml file?
no

bq: ...is there a reason a basic disk space check isn't done 
That would not be very robust. Consider the index is 1G and I have
1.5G of free space. Now replication makes the check and starts.
However, during that time segments are merged consuming .75G. Boom,
disk full again.

Additionally, any checks would be per core. What if 10 cores start
replication as above at once? Which would absolutely happen if you
have 10 replicas for the same shard in one JVM...

And all this masks your real problem; you didn't have enough disk
space to optimize in the first place. Even during regular indexing w/o
optimizing, Lucene segment merging can always theoretically merge all
your segments at once. Therefore you always need at _least_ as much
free space on your disks as all your indexes occupy to be sure you
won't hit a disk-full problem. The rest would be band-aids. Although I
suppose refusing to even start if there wasn't enough free disk space
isn't a bad idea, it's not foolproof though

Best,
Erick


On Mon, Nov 28, 2016 at 8:39 AM, Michael Joyner  wrote:
> Hello all,
>
> I'm running out of spacing when trying to restart nodes to get a cluster
> back up fully operational where a node ran out of space during an optimize.
>
> It appears to be trying to do a full sync from another node, but doesn't
> take care to check available space before starting downloads and doesn't
> delete the out of date segment files before attempting to do the full sync.
>
> If the segments are out of date and we are pulling from another node before
> coming "online" why aren't the old segments deleted? Is this something that
> can be enabled in the master solrconfig.xml file?
>
> It seems to know what size the segments are before they are transferred, is
> there a reason a basic disk space check isn't done for the target partition
> with an immediate abort done if the destination's space looks like it would
> go negative before attempting sync? Is this something that can be enabled in
> the master solrconfig.xml file? This would be a lot more useful (IMHO) than
> waiting for a full sync to complete only to run out of space after several
> hundred gigs of data is transferred with automatic cluster recovery failing
> as a result.
>
> This happens when doing a 'sudo service solr restart'
>
> (Workaround, shutdown offending node, manually delete segment index folders
> and tlog files, start node)
>
> Exception:
>
> WARN  - 2016-11-28 16:15:16.291;
> org.apache.solr.handler.IndexFetcher$FileFetcher; Error in fetching file:
> _2f6i.cfs (downloaded 2317352960 of 5257809205 bytes)
> java.io.IOException: No space left on device
> at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
> at sun.nio.ch.IOUtil.write(IOUtil.java:65)
> at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
> at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
> at java.nio.channels.Channels.writeFully(Channels.java:101)
> at java.nio.channels.Channels.access$000(Channels.java:61)
> at java.nio.channels.Channels$1.write(Channels.java:174)
> at
> org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:419)
> at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
> at
> org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
> at
> org.apache.solr.handler.IndexFetcher$DirectoryFile.write(IndexFetcher.java:1634)
> at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1491)
> at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1429)
> at
> org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:855)
> at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:434)
> at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
> at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156)
> at
> 

Re: Solr 6 Performance Suggestions

2016-11-28 Thread Walter Underwood
We had some serious slowness at startup before I set Xms to be the same as Xmx.

We run with an 8G heap. We have multiple collections but don’t use faceting.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 28, 2016, at 8:40 AM, Max Bridgewater  
> wrote:
> 
> Thanks again Folks. I tried each suggestion and none made any difference. I
> am setting up a lab for performance monitoring using App Dynamics.
> Hopefully I am able to figure out something.
> 
> On Mon, Nov 28, 2016 at 11:20 AM, Erick Erickson 
> wrote:
> 
>> bq: If you know the maximum size you ever will need, setting Xmx is good.
>> 
>> Not quite sure what you're getting at here. I pretty much guarantee that a
>> production system will eat up the default heap size, so not setting Xmx
>> will
>> cause OOM errors pretty soon. Or did you mean Xms?
>> 
>> As far as setting Xms, there are differing opinions, mostly though since
>> Solr
>> likes memory so much there's a lot of tuning to try to determine Xmx and
>> it's pretty much guaranteed that Java will need close to that amount of
>> memory.
>> So setting Xms=Xmx is a minor optimization if that assumption is true.
>> It's arguable
>> how much practical difference it makes though.
>> 
>> Best,
>> Erick
>> 
>> On Mon, Nov 28, 2016 at 2:14 AM, Florian Gleixner  wrote:
>>> Am 28.11.2016 um 00:00 schrieb Shawn Heisey:
 
 On 11/27/2016 12:51 PM, Florian Gleixner wrote:
> 
> On 22.11.2016 14:54, Max Bridgewater wrote:
>> 
>> test cases were exactly the same, the machines where exactly the same
>> and heap settings exactly the same (Xms24g, Xmx24g). Requests were
>> sent with
> 
> Setting heap too large is a common error. Recent Solr use the
> filesystem cache, so you don't have to set heap to the size of the
> index. The avalible RAM has to be able to run the OS, run the jvm and
> hold most of the index data in filesystem cache. If you have 32GB RAM
> and a 20GB Index, then set -Xms never higher than 10GB. I personally
> would set -Xms to 4GB and omit -Xmx
 
 
 In my mind, the Xmx setting is much more important than Xms.  Setting
 both to the same number avoids any need for Java to detect memory
 pressure before increasing the heap size, which can be helpful.
 
>>> 
>>> From https://cwiki.apache.org/confluence/display/solr/JVM+Settings
>>> 
>>> "The maximum heap size, set with -Xmx, is more critical. If the memory
>> heap
>>> grows to this size, object creation may begin to fail and throw
>>> OutOfMemoryException. Setting this limit too low can cause spurious
>> errors
>>> in your application, but setting it too high can be detrimental as well."
>>> 
>>> you are right, Xmx is more important. But setting Xms to Xmx will waste
>> RAM,
>>> that the OS can use to cache your index data. Setting Xmx can avoid
>> problems
>>> in some situations where solr can eat up your filesystem cache until the
>>> next GC has been finished.
>>> 
 Without Xmx, Java is in control of the max heap size, and it may not
 make the correct choice.  It's important to know what your max heap is,
 because chances are excellent that the max heap *will* be reached.  Solr
 allocates a lot of memory to do its job.
 
>>> 
>>> If you know the maximum size you ever will need, setting Xmx is good.
>>> 
>>> 
>>> 
>>> 
>> 



Re: initiate solr could collection

2016-11-28 Thread Novin Novin
Apologies for that didn't described properly. Thanks for the help. I will
look into this.

On Mon, 28 Nov 2016 at 16:33 Erick Erickson  wrote:

> Please state the full problem rather than make us pull things out in
> dribs and drabs.
>
> Have you looked at the bin/solr script options? Particularly the
> create_collection option?
>
> On Mon, Nov 28, 2016 at 8:24 AM, Novin Novin  wrote:
> > Thanks for this Erick, -e brings me to prompt. I can't use it because I
> am
> > using script to setup solr cloud. I required something where I can define
> > shard and replica also.
> > Best,
> > Novin
> >
> > On Mon, 28 Nov 2016 at 16:14 Erick Erickson 
> wrote:
> >
> >> try
> >>
> >> bin/solr start -e cloud -z ZK_NODE
> >>
> >> That'll guide you through creating a collection, assuming you can get
> >> by with one of the stock configuration sets.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Nov 28, 2016 at 8:11 AM, Novin Novin 
> wrote:
> >> > Hi Guys,
> >> >
> >> > Does solr has any way to create collection when solr cloud is getting
> >> > started first time?
> >> >
> >> > Best,
> >> > Novin
> >>
>


Re: Solr 6 Performance Suggestions

2016-11-28 Thread Max Bridgewater
Thanks again Folks. I tried each suggestion and none made any difference. I
am setting up a lab for performance monitoring using App Dynamics.
Hopefully I am able to figure out something.

On Mon, Nov 28, 2016 at 11:20 AM, Erick Erickson 
wrote:

> bq: If you know the maximum size you ever will need, setting Xmx is good.
>
> Not quite sure what you're getting at here. I pretty much guarantee that a
> production system will eat up the default heap size, so not setting Xmx
> will
> cause OOM errors pretty soon. Or did you mean Xms?
>
> As far as setting Xms, there are differing opinions, mostly though since
> Solr
> likes memory so much there's a lot of tuning to try to determine Xmx and
> it's pretty much guaranteed that Java will need close to that amount of
> memory.
> So setting Xms=Xmx is a minor optimization if that assumption is true.
> It's arguable
> how much practical difference it makes though.
>
> Best,
> Erick
>
> On Mon, Nov 28, 2016 at 2:14 AM, Florian Gleixner  wrote:
> > Am 28.11.2016 um 00:00 schrieb Shawn Heisey:
> >>
> >> On 11/27/2016 12:51 PM, Florian Gleixner wrote:
> >>>
> >>> On 22.11.2016 14:54, Max Bridgewater wrote:
> 
>  test cases were exactly the same, the machines where exactly the same
>  and heap settings exactly the same (Xms24g, Xmx24g). Requests were
>  sent with
> >>>
> >>> Setting heap too large is a common error. Recent Solr use the
> >>> filesystem cache, so you don't have to set heap to the size of the
> >>> index. The avalible RAM has to be able to run the OS, run the jvm and
> >>> hold most of the index data in filesystem cache. If you have 32GB RAM
> >>> and a 20GB Index, then set -Xms never higher than 10GB. I personally
> >>> would set -Xms to 4GB and omit -Xmx
> >>
> >>
> >> In my mind, the Xmx setting is much more important than Xms.  Setting
> >> both to the same number avoids any need for Java to detect memory
> >> pressure before increasing the heap size, which can be helpful.
> >>
> >
> > From https://cwiki.apache.org/confluence/display/solr/JVM+Settings
> >
> > "The maximum heap size, set with -Xmx, is more critical. If the memory
> heap
> > grows to this size, object creation may begin to fail and throw
> > OutOfMemoryException. Setting this limit too low can cause spurious
> errors
> > in your application, but setting it too high can be detrimental as well."
> >
> > you are right, Xmx is more important. But setting Xms to Xmx will waste
> RAM,
> > that the OS can use to cache your index data. Setting Xmx can avoid
> problems
> > in some situations where solr can eat up your filesystem cache until the
> > next GC has been finished.
> >
> >> Without Xmx, Java is in control of the max heap size, and it may not
> >> make the correct choice.  It's important to know what your max heap is,
> >> because chances are excellent that the max heap *will* be reached.  Solr
> >> allocates a lot of memory to do its job.
> >>
> >
> > If you know the maximum size you ever will need, setting Xmx is good.
> >
> >
> >
> >
>


Failure when trying to full sync, out of space ? Doesn't delete old segments before full sync?

2016-11-28 Thread Michael Joyner

Hello all,

I'm running out of spacing when trying to restart nodes to get a cluster 
back up fully operational where a node ran out of space during an optimize.


It appears to be trying to do a full sync from another node, but doesn't 
take care to check available space before starting downloads and doesn't 
delete the out of date segment files before attempting to do the full sync.


If the segments are out of date and we are pulling from another node 
before coming "online" why aren't the old segments deleted? Is this 
something that can be enabled in the master solrconfig.xml file?


It seems to know what size the segments are before they are transferred, 
is there a reason a basic disk space check isn't done for the target 
partition with an immediate abort done if the destination's space looks 
like it would go negative before attempting sync? Is this something that 
can be enabled in the master solrconfig.xml file? This would be a lot 
more useful (IMHO) than waiting for a full sync to complete only to run 
out of space after several hundred gigs of data is transferred with 
automatic cluster recovery failing as a result.


This happens when doing a 'sudo service solr restart'

(Workaround, shutdown offending node, manually delete segment index 
folders and tlog files, start node)


Exception:

WARN  - 2016-11-28 16:15:16.291; 
org.apache.solr.handler.IndexFetcher$FileFetcher; Error in fetching 
file: _2f6i.cfs (downloaded 2317352960 of 5257809205 bytes)

java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
at java.nio.channels.Channels.writeFully(Channels.java:101)
at java.nio.channels.Channels.access$000(Channels.java:61)
at java.nio.channels.Channels$1.write(Channels.java:174)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:419)

at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at 
org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
at 
org.apache.solr.handler.IndexFetcher$DirectoryFile.write(IndexFetcher.java:1634)
at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1491)
at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1429)
at 
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:855)
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:434)
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:408)
at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:221)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

-Mike



Re: initiate solr could collection

2016-11-28 Thread Erick Erickson
Please state the full problem rather than make us pull things out in
dribs and drabs.

Have you looked at the bin/solr script options? Particularly the
create_collection option?

On Mon, Nov 28, 2016 at 8:24 AM, Novin Novin  wrote:
> Thanks for this Erick, -e brings me to prompt. I can't use it because I am
> using script to setup solr cloud. I required something where I can define
> shard and replica also.
> Best,
> Novin
>
> On Mon, 28 Nov 2016 at 16:14 Erick Erickson  wrote:
>
>> try
>>
>> bin/solr start -e cloud -z ZK_NODE
>>
>> That'll guide you through creating a collection, assuming you can get
>> by with one of the stock configuration sets.
>>
>> Best,
>> Erick
>>
>> On Mon, Nov 28, 2016 at 8:11 AM, Novin Novin  wrote:
>> > Hi Guys,
>> >
>> > Does solr has any way to create collection when solr cloud is getting
>> > started first time?
>> >
>> > Best,
>> > Novin
>>


Re: initiate solr could collection

2016-11-28 Thread Novin Novin
Thanks for this Erick, -e brings me to prompt. I can't use it because I am
using script to setup solr cloud. I required something where I can define
shard and replica also.
Best,
Novin

On Mon, 28 Nov 2016 at 16:14 Erick Erickson  wrote:

> try
>
> bin/solr start -e cloud -z ZK_NODE
>
> That'll guide you through creating a collection, assuming you can get
> by with one of the stock configuration sets.
>
> Best,
> Erick
>
> On Mon, Nov 28, 2016 at 8:11 AM, Novin Novin  wrote:
> > Hi Guys,
> >
> > Does solr has any way to create collection when solr cloud is getting
> > started first time?
> >
> > Best,
> > Novin
>


Re: Solr 6 Performance Suggestions

2016-11-28 Thread Erick Erickson
bq: If you know the maximum size you ever will need, setting Xmx is good.

Not quite sure what you're getting at here. I pretty much guarantee that a
production system will eat up the default heap size, so not setting Xmx will
cause OOM errors pretty soon. Or did you mean Xms?

As far as setting Xms, there are differing opinions, mostly though since Solr
likes memory so much there's a lot of tuning to try to determine Xmx and
it's pretty much guaranteed that Java will need close to that amount of memory.
So setting Xms=Xmx is a minor optimization if that assumption is true.
It's arguable
how much practical difference it makes though.

Best,
Erick

On Mon, Nov 28, 2016 at 2:14 AM, Florian Gleixner  wrote:
> Am 28.11.2016 um 00:00 schrieb Shawn Heisey:
>>
>> On 11/27/2016 12:51 PM, Florian Gleixner wrote:
>>>
>>> On 22.11.2016 14:54, Max Bridgewater wrote:

 test cases were exactly the same, the machines where exactly the same
 and heap settings exactly the same (Xms24g, Xmx24g). Requests were
 sent with
>>>
>>> Setting heap too large is a common error. Recent Solr use the
>>> filesystem cache, so you don't have to set heap to the size of the
>>> index. The avalible RAM has to be able to run the OS, run the jvm and
>>> hold most of the index data in filesystem cache. If you have 32GB RAM
>>> and a 20GB Index, then set -Xms never higher than 10GB. I personally
>>> would set -Xms to 4GB and omit -Xmx
>>
>>
>> In my mind, the Xmx setting is much more important than Xms.  Setting
>> both to the same number avoids any need for Java to detect memory
>> pressure before increasing the heap size, which can be helpful.
>>
>
> From https://cwiki.apache.org/confluence/display/solr/JVM+Settings
>
> "The maximum heap size, set with -Xmx, is more critical. If the memory heap
> grows to this size, object creation may begin to fail and throw
> OutOfMemoryException. Setting this limit too low can cause spurious errors
> in your application, but setting it too high can be detrimental as well."
>
> you are right, Xmx is more important. But setting Xms to Xmx will waste RAM,
> that the OS can use to cache your index data. Setting Xmx can avoid problems
> in some situations where solr can eat up your filesystem cache until the
> next GC has been finished.
>
>> Without Xmx, Java is in control of the max heap size, and it may not
>> make the correct choice.  It's important to know what your max heap is,
>> because chances are excellent that the max heap *will* be reached.  Solr
>> allocates a lot of memory to do its job.
>>
>
> If you know the maximum size you ever will need, setting Xmx is good.
>
>
>
>


Re: initiate solr could collection

2016-11-28 Thread Erick Erickson
try

bin/solr start -e cloud -z ZK_NODE

That'll guide you through creating a collection, assuming you can get
by with one of the stock configuration sets.

Best,
Erick

On Mon, Nov 28, 2016 at 8:11 AM, Novin Novin  wrote:
> Hi Guys,
>
> Does solr has any way to create collection when solr cloud is getting
> started first time?
>
> Best,
> Novin


initiate solr could collection

2016-11-28 Thread Novin Novin
Hi Guys,

Does solr has any way to create collection when solr cloud is getting
started first time?

Best,
Novin


Using atomic update in Solr get an error

2016-11-28 Thread giladv
I'm getting the following error in 5.2.1: RunUpdateProcessor has received an
AddUpdateCommand containing a document that appears to still contain Atomic
document update operations, most likely because
DistributedUpdateProcessorFactory was explicitly disabled from this
updateRequestProcessorChain

I tried working in cloud and in single. Guess that must be something with my
solrconfig.xml - can someone please post example to a file that works?

BTW - got the same error when trying solrj and also curl command (with xml
in a file)

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-atomic-update-in-Solr-get-an-error-tp4307661.html
Sent from the Solr - User mailing list archive at Nabble.com.


3rd party integrations (was: The state of Solr 5. Is it in maintenance mode only?)

2016-11-28 Thread Alexandre Rafalovitch
On 29 November 2016 at 00:24, Shawn Heisey  wrote:
> Third-party integrations (Solr support in other software) tend to be
> VERY slow to upgrade.  Some of them are still shipping configs designed
> for Solr 3.x, which won't work in 5.x and later.  Some are still
> shipping configs designed for 4.x, which frequently don't work in 6.x,
> and generate warnings in 5.x.

I wonder what the root cause of it is and how much effort would it be
to change the status quo. Is that an issue of somebody very familiar
with Solr trying those products together and reporting what needs to
be fixed and why?

Or are the integrations done super-sporadically and are basically
abandoned unless somebody steps in?

I'd love a range of opinions from the downstream users/implementors.

Regards,
Alex.
P.s. I am _presuming_ there is a value in the integrations using
latest Solr and therefore the version lag is a bad thing. Perhaps I am
wrong on that too.


http://www.solr-start.com/ - Resources for Solr users, new and experienced


Re: Break up a supplier's documents (products) from dominating search result.

2016-11-28 Thread Alexandre Rafalovitch
Is it technically possible to expose it in Solr? Because there was
also 
http://stackoverflow.com/questions/40831474/randomize-result-set-between-the-brands-in-solr/40835382#40835382
. Seems a popular request (or I misread different things in the same
way).

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 29 November 2016 at 00:48, Shalin Shekhar Mangar
 wrote:
> There is a related work done in Lucene land which hasn't been exposed
> in Solr yet. It is called DiversifiedTopDocsCollector. See
> https://issues.apache.org/jira/browse/LUCENE-6066
>
> On Mon, Nov 28, 2016 at 2:39 PM, Derek Poh  wrote:
>> Hi
>>
>> We have a business requirement to breakupa supplier's products from
>> dominating search resultso as to allow othersuppliers' products in the
>> search result to have exposure.
>> Business users are open to implementing this for the first page of the
>> search resultif it is not possible to apply tothe entire search result.
>>
>> From the sample keywords users have provided, I also discovered thatmost of
>> the time a supplier's products that are listed consecutively in the result
>> all have the same score.
>>
>> Any advice/suggestions on how I cando it?
>>
>> Please let me know if more information is require. Thank you.
>>
>> Derek
>>
>> --
>> CONFIDENTIALITY NOTICE
>> This e-mail (including any attachments) may contain confidential and/or
>> privileged information. If you are not the intended recipient or have
>> received this e-mail in error, please inform the sender immediately and
>> delete this e-mail (including any attachments) from your computer, and you
>> must not use, disclose to anyone else or copy this e-mail (including any
>> attachments), whether in whole or in part.
>> This e-mail and any reply to it may be monitored for security, legal,
>> regulatory compliance and/or other appropriate reasons.
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.


Re: Break up a supplier's documents (products) from dominating search result.

2016-11-28 Thread Shalin Shekhar Mangar
There is a related work done in Lucene land which hasn't been exposed
in Solr yet. It is called DiversifiedTopDocsCollector. See
https://issues.apache.org/jira/browse/LUCENE-6066

On Mon, Nov 28, 2016 at 2:39 PM, Derek Poh  wrote:
> Hi
>
> We have a business requirement to breakupa supplier's products from
> dominating search resultso as to allow othersuppliers' products in the
> search result to have exposure.
> Business users are open to implementing this for the first page of the
> search resultif it is not possible to apply tothe entire search result.
>
> From the sample keywords users have provided, I also discovered thatmost of
> the time a supplier's products that are listed consecutively in the result
> all have the same score.
>
> Any advice/suggestions on how I cando it?
>
> Please let me know if more information is require. Thank you.
>
> Derek
>
> --
> CONFIDENTIALITY NOTICE
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.



-- 
Regards,
Shalin Shekhar Mangar.


Re: The state of Solr 5. Is it in maintenance mode only?

2016-11-28 Thread Shawn Heisey
On 11/28/2016 6:11 AM, Jaroslaw Rozanski wrote:
> As for adoption levels, it was my subjective feel reading this list. Do
> we have community survey on that subject? That would be really
> interesting to see.

That's really hard for me to say.  Users tend to not what version they
are running unless they're having issues.

Third-party integrations (Solr support in other software) tend to be
VERY slow to upgrade.  Some of them are still shipping configs designed
for Solr 3.x, which won't work in 5.x and later.  Some are still
shipping configs designed for 4.x, which frequently don't work in 6.x,
and generate warnings in 5.x.

Thanks,
Shawn



Re: The state of Solr 5. Is it in maintenance mode only?

2016-11-28 Thread Jaroslaw Rozanski
Hi,

Thanks for elaborate response. Missed the link to duplicate JIRA. Makes
sense.

On the 5.x front I wasn't expecting 5.6 release now that we have 6.x but
was simply surprised to see fix for 4.x and not for 5.x.

As for adoption levels, it was my subjective feel reading this list. Do
we have community survey on that subject? That would be really
interesting to see.


Thanks,
Jaroslaw


On 28/11/16 12:59, Shawn Heisey wrote:
> On 11/28/2016 4:29 AM, Jaroslaw Rozanski wrote:
>> Recently I have noticed that couple of Solr issues have been
>> resolved/added only for Solr 4.x and Solr 6.x branch. For example
>> https://issues.apache.org/jira/browse/SOLR-2242. Has Solr 5.x branch
>> been moved to maintenance mode only? The 5 wasn't around for long
>> before 6 came about so I appreciate its adoption might not be vast.
> 
> The 5.0 version was announced in March 2015.  The 6.0 version was
> announced in April 2016.  Looks like 4.x was current for a little less
> than three years (July 2012 for 4.0).  5.x had one year, which I
> wouldn't call really call a short time.
> 
> Since the release of 6.0, 4.x is dead and 5.x is in maintenance mode. 
> Maintenance mode means that only particularly nasty bugs are fixed and
> only extremely trivial features are added.  The latter is usually only
> done if the lack of the feature can be considered a bug.  There is never
> any guarantee that a new 5.x release will be made, but if that happens,
> it will be a 5.5.x release.  The likelihood of seeing a 5.6 release is
> VERY low.
> 
> SOLR-2242 is a duplicate of SOLR-6348.  It probably had 4.9 in the fixed
> version field because that's what was already in it when it was resolved
> as a duplicate.  It's a very old issue that's been around since the 3.x
> days.  No changes were committed for SOLR-2242.  The changes for
> SOLR-6348 were committed to 5.2 and 6.0.  I have updated the fix
> versions in the older issue to match.  The versions should probably all
> be removed, but I am not sure what our general rule is for duplicates.
> 
> Thanks,
> Shawn
> 

-- 
Jaroslaw Rozanski | e: m...@jarekrozanski.com



signature.asc
Description: OpenPGP digital signature


Re: SOl6.3 Alchemy Annotator Not Working

2016-11-28 Thread Shawn Heisey
On 11/28/2016 12:50 AM, soumitra80 wrote:
> This issue has been resolved. Please close this 

Unless you opened an issue, there wasn't ever one open.  I did not see
an issue number, so if there's something to close, I'm not aware of it.

The class you mentioned in your original post, XmlUpdateRequestHandler,
was deprecated in the 4.x timeframe, which means it was entirely removed
in 5.0.  It definitely wouldn't be there in 6.3.  What documentation are
you looking at which still mentions it?  If it's documentation under the
control of the project, we need to get it updated.

Thanks,
Shawn



Re: The state of Solr 5. Is it in maintenance mode only?

2016-11-28 Thread Shawn Heisey
On 11/28/2016 4:29 AM, Jaroslaw Rozanski wrote:
> Recently I have noticed that couple of Solr issues have been
> resolved/added only for Solr 4.x and Solr 6.x branch. For example
> https://issues.apache.org/jira/browse/SOLR-2242. Has Solr 5.x branch
> been moved to maintenance mode only? The 5 wasn't around for long
> before 6 came about so I appreciate its adoption might not be vast.

The 5.0 version was announced in March 2015.  The 6.0 version was
announced in April 2016.  Looks like 4.x was current for a little less
than three years (July 2012 for 4.0).  5.x had one year, which I
wouldn't call really call a short time.

Since the release of 6.0, 4.x is dead and 5.x is in maintenance mode. 
Maintenance mode means that only particularly nasty bugs are fixed and
only extremely trivial features are added.  The latter is usually only
done if the lack of the feature can be considered a bug.  There is never
any guarantee that a new 5.x release will be made, but if that happens,
it will be a 5.5.x release.  The likelihood of seeing a 5.6 release is
VERY low.

SOLR-2242 is a duplicate of SOLR-6348.  It probably had 4.9 in the fixed
version field because that's what was already in it when it was resolved
as a duplicate.  It's a very old issue that's been around since the 3.x
days.  No changes were committed for SOLR-2242.  The changes for
SOLR-6348 were committed to 5.2 and 6.0.  I have updated the fix
versions in the older issue to match.  The versions should probably all
be removed, but I am not sure what our general rule is for duplicates.

Thanks,
Shawn



The state of Solr 5. Is it in maintenance mode only?

2016-11-28 Thread Jaroslaw Rozanski
Hi,

Recently I have noticed that couple of Solr issues have been
resolved/added only for Solr 4.x and Solr 6.x branch. For example
https://issues.apache.org/jira/browse/SOLR-2242.

Has Solr 5.x branch been moved to maintenance mode only? The 5 wasn't
around for long before 6 came about so I appreciate its adoption might
not be vast.



-- 
Jaroslaw Rozanski | e: m...@jarekrozanski.com




signature.asc
Description: OpenPGP digital signature


Re: Solr 6 Performance Suggestions

2016-11-28 Thread Florian Gleixner

Am 28.11.2016 um 00:00 schrieb Shawn Heisey:

On 11/27/2016 12:51 PM, Florian Gleixner wrote:

On 22.11.2016 14:54, Max Bridgewater wrote:

test cases were exactly the same, the machines where exactly the same
and heap settings exactly the same (Xms24g, Xmx24g). Requests were
sent with

Setting heap too large is a common error. Recent Solr use the
filesystem cache, so you don't have to set heap to the size of the
index. The avalible RAM has to be able to run the OS, run the jvm and
hold most of the index data in filesystem cache. If you have 32GB RAM
and a 20GB Index, then set -Xms never higher than 10GB. I personally
would set -Xms to 4GB and omit -Xmx


In my mind, the Xmx setting is much more important than Xms.  Setting
both to the same number avoids any need for Java to detect memory
pressure before increasing the heap size, which can be helpful.



From https://cwiki.apache.org/confluence/display/solr/JVM+Settings

"The maximum heap size, set with -Xmx, is more critical. If the memory 
heap grows to this size, object creation may begin to fail and throw 
OutOfMemoryException. Setting this limit too low can cause spurious 
errors in your application, but setting it too high can be detrimental 
as well."


you are right, Xmx is more important. But setting Xms to Xmx will waste 
RAM, that the OS can use to cache your index data. Setting Xmx can avoid 
problems in some situations where solr can eat up your filesystem cache 
until the next GC has been finished.



Without Xmx, Java is in control of the max heap size, and it may not
make the correct choice.  It's important to know what your max heap is,
because chances are excellent that the max heap *will* be reached.  Solr
allocates a lot of memory to do its job.



If you know the maximum size you ever will need, setting Xmx is good.






signature.asc
Description: OpenPGP digital signature


Re: Break up a supplier's documents (products) from dominating search result.

2016-11-28 Thread Alexandre Rafalovitch
You have described your _negative_ business requirements, but not the
_positive_ ones. So, it is hard to see what they want to happen. It is
easy enough to promote or demote a particular filter matches. But you
want to partially limit them. On a first page? What about on the
second?

I suspect you would have to have a slightly different interface to do
this effectively. And, most likely, using Collapse and Expand:
https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 28 November 2016 at 20:09, Derek Poh  wrote:
> Hi
>
> We have a business requirement to breakupa supplier's products from
> dominating search resultso as to allow othersuppliers' products in the
> search result to have exposure.
> Business users are open to implementing this for the first page of the
> search resultif it is not possible to apply tothe entire search result.
>
> From the sample keywords users have provided, I also discovered thatmost of
> the time a supplier's products that are listed consecutively in the result
> all have the same score.
>
> Any advice/suggestions on how I cando it?
>
> Please let me know if more information is require. Thank you.
>
> Derek
>
> --
> CONFIDENTIALITY NOTICE
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.


Break up a supplier's documents (products) from dominating search result.

2016-11-28 Thread Derek Poh

Hi

We have a business requirement to breakupa supplier's products from 
dominating search resultso as to allow othersuppliers' products in the 
search result to have exposure.
Business users are open to implementing this for the first page of the 
search resultif it is not possible to apply tothe entire search result.


From the sample keywords users have provided, I also discovered thatmost 
of the time a supplier's products that are listed consecutively in the 
result all have the same score.


Any advice/suggestions on how I cando it?

Please let me know if more information is require. Thank you.

Derek

--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: AW: AW: Resync after restart

2016-11-28 Thread Arkadi Colson
We do. Indexing is always running. Fix version is 6.3 so can I assume 
that the issue is fixed in 6.3? We are running 6.3 right now so or the 
fix is not in 6.3 or another issue is causing the full resync.


BR
Arkadi


On 25-11-16 18:23, Pushkar Raste wrote:


Did you index any documents while node was being restarted? There was 
a issue introduced due to IndexFingerprint comparison. Check 
SOLR-9310. I am not sure if fix made it to Solr6.2



On Nov 25, 2016 3:51 AM, "Arkadi Colson" > wrote:


I am using SolrCloud on version 6.2.1. I will upgrade to 6.3.0
next week.

This is the current config for numVersionBuckets:


  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}


Are you saying that I should not use the config below on SolrCloud?

  

  18.75
  05:00:00
  15
  30

  

Br,
Arkadi


On 24-11-16 17:46, Erick Erickson wrote:

Hold on. Are you using SolrCloud or not? There is a lot of
talk here
about masters and slaves, then you say "I always add slaves
with the
collection API", collections are a SolrCloud construct.

It sounds like you're mixing the two. You should _not_ configure
master/slave replication parameters with SolrCloud. Take a
look at the
sample configs

And you haven't told us what version of Solr you're using, we can
infer a relatively recent one because of the high number you
have for
numVersionBuckets, but that's guessing.

If you are _not_ in SolrCloud, then maybe:
https://issues.apache.org/jira/browse/SOLR-9036
 is relevant.

Best,
Erick

On Thu, Nov 24, 2016 at 3:10 AM, Arkadi Colson
> wrote:

This is the code from the master node. Al configs are the
same on all nodes.
I always add slaves with the collection API. Is there an
other place to look
for this part of the config?



On 24-11-16 12:02, Michael Aleythe, Sternwald wrote:

You need to change this on the master node. The part
of the config you
pasted here, looks like it is from the slave node.

-Ursprüngliche Nachricht-
Von: Arkadi Colson [mailto:ark...@smartbit.be
]
Gesendet: Donnerstag, 24. November 2016 11:56
An: solr-user@lucene.apache.org

Betreff: Re: AW: Resync after restart

Hi Michael

Thanks for the quick response! The line does not exist
in my config. So
can I assume that the default configuration is to not
replicate at startup?

 
   
 18.75
 05:00:00
 15
 30
   
 

Any other idea's?


On 24-11-16 11:49, Michael Aleythe, Sternwald wrote:

Hi Arkadi,

you need to remove the line "startup"
from your ReplicationHandler-config in
solrconfig.xml ->
https://wiki.apache.org/solr/SolrReplication
.

Greetings
Michael

-Ursprüngliche Nachricht-
Von: Arkadi Colson [mailto:ark...@smartbit.be
]
Gesendet: Donnerstag, 24. November 2016 09:26
An: solr-user >
Betreff: Resync after restart

Hi

Almost every time when restarting a solr instance
the index is replicated
completely. Is there a way to avoid this somehow?
The index currently has a
size of about 17GB.
Some advice here would be great.

99% of the config is defaul:

 ${solr.ulog.dir:} ${solr.ulog.numVersionBuckets:65536}
 
   ${solr.autoCommit.ma
xTime:15000}
   false
 

If you need more info, just let me know...

Thx!
Arkadi






Custom EntityProcessor for DataImportHandler related Issue.

2016-11-28 Thread anupambumba
Hi All,
I am facing some issue with the Custom EntityProcessor for DataImportHandler
related Issue. 

*My Requirement:*

My Requirement is to process a main file along with it's associated chunk
files( Child Files) placed in a folder. The file related information are
part of JSON file placed in the same location. JSON file is having main file
metadata and also child related metadata. There may be multiple json file
and corresponding main file and associated chunk files. All I need to do is
to process the JSON and then from it extract the metadata and then index it
in SOLR. Main files and Chunk files should be indexed as separate file.

*The Approach I have Followed:*

I have used FileListEntityProcessor along with Custom Entity Processor to
Parse the JSON and indexing data in JSON. The details of the configuration
and code that I have used is as mentioned below.

*The Code and Configuration Details:*

solr-config.xml:

 

  C:\SOLR\solr-6.3.0\server\solr\dataimporttest\conf\solr-data-config.xml

  

solr-data-config.xml:

  
  
  


 

   
 
 


  

   



 

  


LNKEntityProcessor.java:

package com.ibm.lnk.processor;

import java.io.IOException;
import java.io.InputStream;
import java.lang.reflect.GenericArrayType;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

import org.apache.solr.handler.dataimport.Context;
import org.apache.solr.handler.dataimport.DataSource;
import org.apache.solr.handler.dataimport.EntityProcessorBase;
import org.apache.tika.exception.TikaException;
import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.xml.sax.SAXException;

import com.ibm.lnk.utility.LNKProcessorUtility;

public class LNKEntityProcessor extends EntityProcessorBase {

private static final int String = 0;
private final Logger slf4jLogger =
LoggerFactory.getLogger(LNKEntityProcessor.class);
private boolean isMainFileProcessingComplete = false;
private static int i = 0;
private static JSONObject parsedJSONMap = null;
private static int childIndexValue = 0;
private static int numberOfChildElements = 0;

@Override
protected void firstInit(Context context) {
slf4jLogger.info("firstInit() is getting called  ");

super.firstInit(context);
}

@Override
public void init(Context context) {

slf4jLogger.info("init is getting called ::");
isMainFileProcessingComplete = false;
parsedJSONMap = null;
childIndexValue = 0;
numberOfChildElements = 0;
super.init(context);
}

public Map nextRow() {


slf4jLogger.info("Entering the nextRow() with 
isFileProcessingComplete
value  " + isMainFileProcessingComplete);
Map dataMap=null;

try {
if (isMainFileProcessingComplete) {

/*if (numberOfChildElements == childIndexValue) 
{
dataMap  = null;
} else {
dataMap = parseRow(false, 
childIndexValue);

childIndexValue++;
}*/
dataMap  = null;

} else {

DataSource dataSource = 
this.context.getDataSource();
InputStream inputStreamOfFile = (InputStream) 
dataSource

.getData(this.context.getResolvedEntityAttribute("url"));

String fileAbsolutePath =
this.context.getResolvedEntityAttribute("url");
slf4jLogger.info("  The Url for the file is 
  " + fileAbsolutePath);
String currentJsonString =
LNKProcessorUtility.getTextContent(inputStreamOfFile);
slf4jLogger.info("The JSON String to be parsed 
is   " +
currentJsonString);
dataMap = new HashMap();
parsedJSONMap = 
LNKProcessorUtility.parseJSONMap(currentJsonString);
slf4jLogger.info("Parsed Map is   " + 
parsedJSONMap);
dataMap =