Re: Clustering error in Solr 8.2.0

2019-08-12 Thread Jörn Franke
Depends if they do breaking changes in common-lang or not.

By using an old version of a library such as common-lang you may introduce 
security issues in your setup.

> Am 13.08.2019 um 06:12 schrieb Zheng Lin Edwin Yeo :
> 
> I have found that the  Lingo3GClusteringAlgorithm  will work if I copied
> the commons-lang-2.6.jar from the previous version to
> solr-8.2.0\server\solr-webapp\webapp\WEB-INF\lib.
> 
> Will this work in the long run? Because our lingo3g licence is not eligible
> to download the latest version of 1.16, so we are currently stuck with the
> older version 1.15.1, which still uses commons-lang dependency.
> 
> Regards,
> Edwin
> 
> On Tue, 13 Aug 2019 at 00:14, Zheng Lin Edwin Yeo 
> wrote:
> 
>> Hi Kevin,
>> 
>> Thanks for the info.
>> 
>> I think should be lingo3g problem.  The problem occurs when I use
>> Lingo3GClusteringAlgorithm.
>> > name="carrot.algorithm">com.carrotsearch.lingo3g.Lingo3GClusteringAlgorithm
>> 
>> If I change back to LingoClusteringAlgorithm, it will work.
>> > name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm
>> 
>> Regards,
>> Edwin
>> 
>>> On Fri, 9 Aug 2019 at 10:59, Kevin Risden  wrote:
>>> 
>>> According to the stack trace:
>>> 
>>> java.lang.NoClassDefFoundError: org/apache/commons/lang/ObjectUtils
>>>at lingo3g.s.hashCode(Unknown Source)
>>> 
>>> It looks like lingo3g - lingo3g isn't on Maven central and looks like it
>>> requires a license to download. You would have to contact them to see if
>>> it
>>> still uses commons-lang. You could also copy in commons-lang dependency.
>>> 
>>> Kevin Risden
>>> 
>>> 
>>> On Thu, Aug 8, 2019 at 10:23 PM Zheng Lin Edwin Yeo >>> 
>>> wrote:
>>> 
 Hi Erick,
 
 Thanks for your reply.
 
 My clustering code is taken as it is from the Solr package, only the
>>> codes
 related to lingo3g is taken from previous version.
 
 Below are the 3 files that I have taken from previous version:
 - lingo3g-1.15.0
 - morfologik-fsa-2.1.1
 - morfologik-stemming-2.1.1
 
 Does anyone of these could have caused the error?
 
 Regards,
 Edwin
 
 On Thu, 8 Aug 2019 at 19:56, Erick Erickson 
 wrote:
 
> This dependency was removed as part of
> https://issues.apache.org/jira/browse/SOLR-9079, so my guess is
>>> you’re
> pointing to an old version of the clustering code.
> 
> Best,
> Erick
> 
>> On Aug 8, 2019, at 4:22 AM, Zheng Lin Edwin Yeo <
>>> edwinye...@gmail.com>
> wrote:
>> 
>> ObjectUtils
> 
> 
 
>>> 
>> 


Re: Clustering error in Solr 8.2.0

2019-08-12 Thread Zheng Lin Edwin Yeo
I have found that the  Lingo3GClusteringAlgorithm  will work if I copied
the commons-lang-2.6.jar from the previous version to
solr-8.2.0\server\solr-webapp\webapp\WEB-INF\lib.

Will this work in the long run? Because our lingo3g licence is not eligible
to download the latest version of 1.16, so we are currently stuck with the
older version 1.15.1, which still uses commons-lang dependency.

Regards,
Edwin

On Tue, 13 Aug 2019 at 00:14, Zheng Lin Edwin Yeo 
wrote:

> Hi Kevin,
>
> Thanks for the info.
>
> I think should be lingo3g problem.  The problem occurs when I use
> Lingo3GClusteringAlgorithm.
>  name="carrot.algorithm">com.carrotsearch.lingo3g.Lingo3GClusteringAlgorithm
>
> If I change back to LingoClusteringAlgorithm, it will work.
>  name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm
>
> Regards,
> Edwin
>
> On Fri, 9 Aug 2019 at 10:59, Kevin Risden  wrote:
>
>> According to the stack trace:
>>
>> java.lang.NoClassDefFoundError: org/apache/commons/lang/ObjectUtils
>> at lingo3g.s.hashCode(Unknown Source)
>>
>> It looks like lingo3g - lingo3g isn't on Maven central and looks like it
>> requires a license to download. You would have to contact them to see if
>> it
>> still uses commons-lang. You could also copy in commons-lang dependency.
>>
>> Kevin Risden
>>
>>
>> On Thu, Aug 8, 2019 at 10:23 PM Zheng Lin Edwin Yeo > >
>> wrote:
>>
>> > Hi Erick,
>> >
>> > Thanks for your reply.
>> >
>> > My clustering code is taken as it is from the Solr package, only the
>> codes
>> > related to lingo3g is taken from previous version.
>> >
>> > Below are the 3 files that I have taken from previous version:
>> > - lingo3g-1.15.0
>> > - morfologik-fsa-2.1.1
>> > - morfologik-stemming-2.1.1
>> >
>> > Does anyone of these could have caused the error?
>> >
>> > Regards,
>> > Edwin
>> >
>> > On Thu, 8 Aug 2019 at 19:56, Erick Erickson 
>> > wrote:
>> >
>> > > This dependency was removed as part of
>> > > https://issues.apache.org/jira/browse/SOLR-9079, so my guess is
>> you’re
>> > > pointing to an old version of the clustering code.
>> > >
>> > > Best,
>> > > Erick
>> > >
>> > > > On Aug 8, 2019, at 4:22 AM, Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>
>> > > wrote:
>> > > >
>> > > > ObjectUtils
>> > >
>> > >
>> >
>>
>


Re: Best way to retrieve parent documents with children using getBeans method?

2019-08-12 Thread Erick Erickson
Follow the instructions here: 
http://lucene.apache.org/solr/community.html#mailing-lists-irc. You must use 
the _exact_ same e-mail as you used to subscribe.

If the initial try doesn't work and following the suggestions at the "problems" 
link doesn't work for you, let us know. But note you need to show us the 
_entire_ return header to allow anyone to diagnose the problem.

Best,
Erick


> On Aug 12, 2019, at 7:17 PM, Dave Durbin  wrote:
> 
> Unsubscribe
> 
> -- 
> *P.S. We've launched a new blog to share the latest ideas and case studies 
> from our team. Check it out here: product.canva.com 
> . ***
> ** Empowering the 
> world to design
> Also, we're hiring. Apply here! 
> 
>  
>   
>     
>   
> 
> 
> 
> 
> 
> 



Re: Best way to retrieve parent documents with children using getBeans method?

2019-08-12 Thread Dave Durbin
Unsubscribe

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
. ***
** Empowering the 
world to design
Also, we're hiring. Apply here! 

  
  
    
  








Best way to retrieve parent documents with children using getBeans method?

2019-08-12 Thread Pratik Patel
Hello Everyone,

We use SolrJ with POJOs to index documents into solr. If a POJO has a field
annotated with @child then SolrJ automatically adds those objects as
children of the POJO. This works fine and indexing is done properly.

However, when I retrieve the same document through same POJO using
"getBeans" method of DocumentObjectBinder class, the field annotated
with @child annotation is always null i.e. the children are not populated
in POJO.

What is the best way to get children in the same POJO along with other
fields. I read about child transformers but I am not sure if it is the
prescribed and recommended way to get children with parent. What is the
best practice to achieve this?

Thanks!
Pratik


Re: Solr restricting time-consuming/heavy processing queries

2019-08-12 Thread Mark Robinson
Hi Jan,

Thanks for the reply.
Our normal search times is within 650 ms.
We were analyzing some queries and found that few of them were like 14675
ms, 13767 ms etc...
So was curious to see whether we have some way to restrict the query to not
run beyond say 5s or some ideal timing  in SOLR even if it returns only
partial results.

That is how I came across the "timeAllowed" and wanted to check on it.
Also was curious to know whether  "shardHandler"  could be used to work in
those lines or it is meant for a totally different functionality.

Thanks!
Best,
Mark


On Sun, Aug 11, 2019 at 8:17 AM Jan Høydahl  wrote:

> What is the root use case you are trying to solve? What kind of solr
> install is this and do you not have control over the clients or what is the
> reason that users overload your servers?
>
> Normally you would scale the cluster to handle normal expected load
> instead of trying to give users timeout exceptions. What kind of query
> times do you experience that are above 1s and are these not important
> enough to invest extra HW? Trying to understand the real reason behind your
> questions.
>
> Jan Høydahl
>
> > 11. aug. 2019 kl. 11:43 skrev Mark Robinson :
> >
> > Hello,
> > Could someone share their thoughts please or point to some link that
> helps
> > understand my above queries?
> > In the Solr documentation I came across a few lines on timeAllowed and
> > shardHandler, but if there was an example scenario for both it would help
> > understand them more thoroughly.
> > Also curious to know different ways if any n SOLR to restrict/ limit a
> time
> > consuming query from processing for a long time.
> >
> > Thanks!
> > Mark
> >
> > On Fri, Aug 9, 2019 at 2:15 PM Mark Robinson 
> > wrote:
> >
> >>
> >> Hello,
> >> I have the following questions please:-
> >>
> >> In solrconfig.xml I created a new "/selecttimeout" handler copying
> >> "/select" handler and added the following to my new "/selecttimeout":-
> >>  
> >>10
> >>20
> >>  
> >>
> >> 1.
> >> Does the above mean that if I dont get a request once in 10ms on the
> >> socket handling the /selecttimeout handler, that socket will be closed?
> >>
> >> 2.
> >> Same with  connTimeOut? ie the connection  object remains live only if
> at
> >> least a connection request comes once in every 20 mS; if not the object
> >> gets closed?
> >>
> >> Suppose a time consumeing query (say with lots of facets etc...), is
> fired
> >> against SOLR. How can I prevent Solr processing it for not more than 1s?
> >>
> >> 3.
> >> Is this achieved by setting timeAllowed=1000?  Or are there any other
> ways
> >> to do this in Solr?
> >>
> >> 4
> >> For the same purpose to prevent heavy queries overloading SOLR, does the
> >>  above help in anyway or is it that shardHandler has
> nothing
> >> to restrict a query once fired against Solr?
> >>
> >>
> >> Could someone pls share your views?
> >>
> >> Thanks!
> >> Mark
> >>
>


Re: Moving to solrcloud from single instance

2019-08-12 Thread Shawn Heisey

On 8/12/2019 1:42 PM, Erie Data Systems wrote:

I am starting the planning stages of moving from a single instance of solr
8 to a solrcloud implementation.

Currently I have a 148GB index on a single dedicated server w 96gb ram @ 16
cores /2.4ghz ea. + SSD disk. The search is fast but obviously the index
size is greater than the physical memory, which to my understanding is not
a good thing.


An *IDEAL* setup would have enough memory available (not assigned to 
programs) to be able to fit the entire index in the disk cache.


Lots of people run systems that aren't ideal and have perfectly 
acceptable performance.  I did that for several years.  I would have 
loved to have more memory, but the budget wasn't there, and the machines 
I was using were already maxed out at 64GB.


If performance is acceptable already, I think that not being able to fit 
the entire index into available memory is not enough of a reason to make 
significant changes that might require significant development time for 
your systems that keep Solr operational.  Switching to SolrCloud could 
require changes to your other software.



My issue is that im not sure where to go to learn how to set this up, how
many shards, how many replicas, etc and would rather hire somebody or
something (detailed video or document)  to guide me through the process,
and make decisions along the way...For example I think a shard is a piece
of the index... but I dont even know how to decide how many replicas or
what they are .


There are no standardized rules for making these decisions.  Typically 
you have to make an educated guess and try it to see whether it works.


https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

If it's done in the typical way, telling a SolrCloud setup to create a 
collection with 3 shards and 2 replicas will create six individual 
indexes that make up the whole collection.  The index will be split into 
three pieces (shards), and each of those pieces will have two copies 
(replicas).  For each shard, an election will be done that will elect 
one of the replicas as leader.


Sharding adds overhead.  In some cases with extremely large indexes, the 
overhead is less than the performance gained by splitting the index onto 
separate machines and letting those machines work in parallel.  In other 
cases, the overhead may result in things actually getting slower.


Thanks,
Shawn


Re: Moving to solrcloud from single instance

2019-08-12 Thread Erick Erickson
Unless you expect your index to grow, as long performance is satisfactory 
there’s no reason to shard. _Replicate_ perhaps if you need to sustain a higher 
QPS.

Here’s a sizing blog I wrote a long time ago, but it still pertains. The short 
form is for you to load test one of your machines and find out how many docs 
you can put on it before it falls over. _Then_ decide whether you need to shard.

And by “performance is satisfactory”, I mean the time it takes to serve up a 
query. If you need to serve more queries, simply add more replicas (i.e. have a 
single-shard collection). Each replica has the entire index in that case, so if 
1 machine can serve 30 QPS, replicating twice will let you serve 90 QPS .

If you do decide to shard, two things will happen. First, some operations 
aren’t well supported when you shard, group.func to name one.

Second, you’ll introduce a certain amount of overhead (balanced against each 
shard doing less work to be sure).

SolrCloud (in the one-shard, replicated case) will give you some good stuff, 
HA/DR, failover, expandability, etc. so I’m not discouraging moving to that. 
Just don’t shard etc. until you know you need to ;)

Best,
Erick

> On Aug 12, 2019, at 3:44 PM, David Hastings  
> wrote:
> 
> I actually never had a problem with the index being larger than the memory
> for a standalone instance, but the entire index is on an SSD at least one
> my end
> 
> On Mon, Aug 12, 2019 at 3:43 PM Erie Data Systems 
> wrote:
> 
>> I am starting the planning stages of moving from a single instance of solr
>> 8 to a solrcloud implementation.
>> 
>> Currently I have a 148GB index on a single dedicated server w 96gb ram @ 16
>> cores /2.4ghz ea. + SSD disk. The search is fast but obviously the index
>> size is greater than the physical memory, which to my understanding is not
>> a good thing.
>> 
>> I have a lot of experience with single instance but none with solrcloud. I
>> have 3 machines (other than my main 1) with the exact same hardware 96gb *
>> 3 essentially which should be plenty.
>> 
>> My issue is that im not sure where to go to learn how to set this up, how
>> many shards, how many replicas, etc and would rather hire somebody or
>> something (detailed video or document)  to guide me through the process,
>> and make decisions along the way...For example I think a shard is a piece
>> of the index... but I dont even know how to decide how many replicas or
>> what they are .
>> 
>> Thanks everyone.
>> -Craig
>> 



ltr (reranking) in combination with cursorMarks

2019-08-12 Thread Martin Ruderer
Hello,

I am trying to use ltr together with cursorMarks on solr 7.7 and I am
getting the exception java.lang.ClassCastException:
org.apache.lucene.search.TopDocs cannot be cast to
org.apache.lucene.search.TopFieldDocs (full stacktrace below).

Browsing Jira I have found an issue that suggests I should always
include sort by score when combining reranking with cursorMarks.
However, that hasn't changed anything.

Is there anything I am missing?

o.a.s.s.HttpSolrCall null:java.lang.ClassCastException:
org.apache.lucene.search.TopDocs cannot be cast to
org.apache.lucene.search.TopFieldDocs
at 
org.apache.solr.search.SolrIndexSearcher.populateNextCursorMarkFromTopDocs(SolrIndexSearcher.java:1458)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1686)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1395)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:566)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:545)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:724)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:530)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)

Best regards,
Martin Ruderer
-- 
Martin Ruderer
Dr. rer. nat. Dipl.-Math.

freiheit.com technologies gmbh
Budapester Straße 45
20359 Hamburg / Germany
fon: +49 40 / 890584-0
Hamburg HRB 70814

+++ Hamburg/ Germany + Lisbon/ Portugal +++

https://www.freiheit.com
https://www.facebook.com/freiheitcom

B444 034F 9C95 A569 C5DA  087C E6B9 CCF9 5572 A904
Geschäftsführer: Claudia Dietze, Stefan Richter


Re: Moving to solrcloud from single instance

2019-08-12 Thread David Hastings
I actually never had a problem with the index being larger than the memory
for a standalone instance, but the entire index is on an SSD at least one
my end

On Mon, Aug 12, 2019 at 3:43 PM Erie Data Systems 
wrote:

> I am starting the planning stages of moving from a single instance of solr
> 8 to a solrcloud implementation.
>
> Currently I have a 148GB index on a single dedicated server w 96gb ram @ 16
> cores /2.4ghz ea. + SSD disk. The search is fast but obviously the index
> size is greater than the physical memory, which to my understanding is not
> a good thing.
>
> I have a lot of experience with single instance but none with solrcloud. I
> have 3 machines (other than my main 1) with the exact same hardware 96gb *
> 3 essentially which should be plenty.
>
> My issue is that im not sure where to go to learn how to set this up, how
> many shards, how many replicas, etc and would rather hire somebody or
> something (detailed video or document)  to guide me through the process,
> and make decisions along the way...For example I think a shard is a piece
> of the index... but I dont even know how to decide how many replicas or
> what they are .
>
> Thanks everyone.
> -Craig
>


Moving to solrcloud from single instance

2019-08-12 Thread Erie Data Systems
I am starting the planning stages of moving from a single instance of solr
8 to a solrcloud implementation.

Currently I have a 148GB index on a single dedicated server w 96gb ram @ 16
cores /2.4ghz ea. + SSD disk. The search is fast but obviously the index
size is greater than the physical memory, which to my understanding is not
a good thing.

I have a lot of experience with single instance but none with solrcloud. I
have 3 machines (other than my main 1) with the exact same hardware 96gb *
3 essentially which should be plenty.

My issue is that im not sure where to go to learn how to set this up, how
many shards, how many replicas, etc and would rather hire somebody or
something (detailed video or document)  to guide me through the process,
and make decisions along the way...For example I think a shard is a piece
of the index... but I dont even know how to decide how many replicas or
what they are .

Thanks everyone.
-Craig


Re: more like this query parser with faceting

2019-08-12 Thread Szűcs Roland
Thanks David.
This page I was looking for.

Roland

David Hastings  ezt írta (időpont: 2019. aug.
12., H, 20:52):

> should be fine,
> https://cwiki.apache.org/confluence/display/solr/MoreLikeThisHandler
>
> for more info
>
> On Mon, Aug 12, 2019 at 2:49 PM Szűcs Roland 
> wrote:
>
> > Hi David,
> > Thanks the fast reply. Am I right that I can combine fq with mlt only if
> I
> > use more like this as a query parser?
> >
> > Is there a way to achieve the same with mlt as a request handler?
> > Roland
> >
> > David Hastings  ezt írta (időpont: 2019.
> > aug.
> > 12., H, 20:44):
> >
> > > The easiest way will be to pass in a filter query (fq)
> > >
> > > On Mon, Aug 12, 2019 at 2:40 PM Szűcs Roland <
> > szucs.rol...@bookandwalk.hu>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > Is there any tutorial or example how to use more like this
> > functionality
> > > > when we have some other constraints set by the user through faceting
> > > > parameters like price range, or product category for example?
> > > >
> > > > Cheers,
> > > > Roland
> > > >
> > >
> >
>


Re: more like this query parser with faceting

2019-08-12 Thread David Hastings
should be fine,
https://cwiki.apache.org/confluence/display/solr/MoreLikeThisHandler

for more info

On Mon, Aug 12, 2019 at 2:49 PM Szűcs Roland 
wrote:

> Hi David,
> Thanks the fast reply. Am I right that I can combine fq with mlt only if I
> use more like this as a query parser?
>
> Is there a way to achieve the same with mlt as a request handler?
> Roland
>
> David Hastings  ezt írta (időpont: 2019.
> aug.
> 12., H, 20:44):
>
> > The easiest way will be to pass in a filter query (fq)
> >
> > On Mon, Aug 12, 2019 at 2:40 PM Szűcs Roland <
> szucs.rol...@bookandwalk.hu>
> > wrote:
> >
> > > Hi All,
> > >
> > > Is there any tutorial or example how to use more like this
> functionality
> > > when we have some other constraints set by the user through faceting
> > > parameters like price range, or product category for example?
> > >
> > > Cheers,
> > > Roland
> > >
> >
>


Re: more like this query parser with faceting

2019-08-12 Thread Szűcs Roland
Hi David,
Thanks the fast reply. Am I right that I can combine fq with mlt only if I
use more like this as a query parser?

Is there a way to achieve the same with mlt as a request handler?
Roland

David Hastings  ezt írta (időpont: 2019. aug.
12., H, 20:44):

> The easiest way will be to pass in a filter query (fq)
>
> On Mon, Aug 12, 2019 at 2:40 PM Szűcs Roland 
> wrote:
>
> > Hi All,
> >
> > Is there any tutorial or example how to use more like this functionality
> > when we have some other constraints set by the user through faceting
> > parameters like price range, or product category for example?
> >
> > Cheers,
> > Roland
> >
>


Re: more like this query parser with faceting

2019-08-12 Thread David Hastings
The easiest way will be to pass in a filter query (fq)

On Mon, Aug 12, 2019 at 2:40 PM Szűcs Roland 
wrote:

> Hi All,
>
> Is there any tutorial or example how to use more like this functionality
> when we have some other constraints set by the user through faceting
> parameters like price range, or product category for example?
>
> Cheers,
> Roland
>


more like this query parser with faceting

2019-08-12 Thread Szűcs Roland
Hi All,

Is there any tutorial or example how to use more like this functionality
when we have some other constraints set by the user through faceting
parameters like price range, or product category for example?

Cheers,
Roland


Re: Solr cloud questions

2019-08-12 Thread Shawn Heisey

On 8/12/2019 5:47 AM, Kojo wrote:

I am using Solr cloud on this configuration:

2 boxes (one Solr in each box)
4 instances per box


Why are you running multiple instances on one server?  For most setups, 
this has too much overhead.  A single instance can handle many indexes. 
The only good reason I can think of to run multiple instances is when 
the amount of heap memory needed exceeds 31GB.  And even then, four 
instances seems excessive.  If you only have 30 documents, there 
should be no reason for a super large heap.



At this moment I have an active collections with about 300.000 docs. The
other collections are not being queried. The acctive collection is
configured:
- shards: 16
- replication factor: 2

These two Solrs (Solr1 and Solr2) use Zookeper (one box, one instance. No
zookeeper cluster)

My application point to Solr1, and everything works fine, until suddenly on
instance of this Solr1 dies. This istance is on port 8983, the "main"
instance. I thought it could be related to memory usage, but we increase
RAM and JVM memory but it still dies.
The Solr1, the one wich dies,is the destination where I point my web
application.


You will have to check the logs.  If Solr is not running on Windows, 
then any OutOfMemoryError exception, which can be caused by things other 
than a memory shortage, will result in Solr terminating itself.  On 
Windows, that functionality does not yet exist, so it would have to be 
Java or the OS that kills it.



Here I have two questions that I hope you can help me:

1. Which log can I look for debug this issue?


Assuming you're NOT on Windows, check to see if there is a logfile named 
solr_oom_killer-8983.log in the logs directory where solr.log lives.  If 
there is, then that means the oom killer script was executed, and that 
happens when there is an OutOfMemoryError thrown.  The solr.log file 
MIGHT contain the OOME exception which will tell you what system 
resource was depleted.  If it was not heap memory that was depleted, 
then increasing memory probably won't help.


If you share the gc log that Solr writes, we can analyze this to see if 
it was heap memory that was depleted.



2. After this instance dies, the Solr cloud does not answer to my web
application. Is this correct? I thougth that the replicas should answer if
one shard, instance or one box goes down.


If a Solr instance dies, you can't make connections directly to it. 
Connections would need to go to another instance.  You need a load 
balancer to handle that automatically, or a cloud-aware client.  The 
only cloud-aware client that I am sure about is the one for Java -- it 
is named SolrJ, created by the Solr project and distributed with Solr. 
I think that a third party MIGHT have written a cloud-aware client for 
Python, but I am not sure about this.


If you set up a load balancer, you will need to handle redundancy for that.

Side note:  A fully redundant zookeeper install needs three servers.  Do 
not put a load balancer in front of zookeeper.  The ZK protocol handles 
redundancy itself and a load balancer will break that.


Thanks.
Shawn


Re: Solr join

2019-08-12 Thread Zheng Lin Edwin Yeo
Hi Iniyan,

Will you be able to provide the query that you use, as well as the error
message that you received?

Regards,
Edwin

On Mon, 12 Aug 2019 at 01:57, Iniyan  wrote:

> Hi,
>
> I was trying to do join between 2 collections. For that I have followed the
> tutorial how to create colocating collection .
>
> Created one collection with one shard and 2 replication factors.
>
> Created another collection and added the query parameter
> withCollection  = first collection name.
>
> Seems like always this second collection is not created in first collection
> cores.
>
> Because of that join throws an error no active replicas found in 
>
> I am using solr 5.x
>
> Could anyone please help me?
>
> Thanks
> Iniyan P
> --
> Regards,
> Iniyan P
>


Re: Clustering error in Solr 8.2.0

2019-08-12 Thread Zheng Lin Edwin Yeo
Hi Kevin,

Thanks for the info.

I think should be lingo3g problem.  The problem occurs when I use
Lingo3GClusteringAlgorithm.
com.carrotsearch.lingo3g.Lingo3GClusteringAlgorithm

If I change back to LingoClusteringAlgorithm, it will work.
org.carrot2.clustering.lingo.LingoClusteringAlgorithm

Regards,
Edwin

On Fri, 9 Aug 2019 at 10:59, Kevin Risden  wrote:

> According to the stack trace:
>
> java.lang.NoClassDefFoundError: org/apache/commons/lang/ObjectUtils
> at lingo3g.s.hashCode(Unknown Source)
>
> It looks like lingo3g - lingo3g isn't on Maven central and looks like it
> requires a license to download. You would have to contact them to see if it
> still uses commons-lang. You could also copy in commons-lang dependency.
>
> Kevin Risden
>
>
> On Thu, Aug 8, 2019 at 10:23 PM Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi Erick,
> >
> > Thanks for your reply.
> >
> > My clustering code is taken as it is from the Solr package, only the
> codes
> > related to lingo3g is taken from previous version.
> >
> > Below are the 3 files that I have taken from previous version:
> > - lingo3g-1.15.0
> > - morfologik-fsa-2.1.1
> > - morfologik-stemming-2.1.1
> >
> > Does anyone of these could have caused the error?
> >
> > Regards,
> > Edwin
> >
> > On Thu, 8 Aug 2019 at 19:56, Erick Erickson 
> > wrote:
> >
> > > This dependency was removed as part of
> > > https://issues.apache.org/jira/browse/SOLR-9079, so my guess is you’re
> > > pointing to an old version of the clustering code.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Aug 8, 2019, at 4:22 AM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> > > wrote:
> > > >
> > > > ObjectUtils
> > >
> > >
> >
>


Re: Solr cloud questions

2019-08-12 Thread Erick Erickson
Kojo:

The solr logs should give you a much better idea of what the triggering event 
was.

Just increasing the heap doesn’t guarantee much, again the Solr logs will 
report the OOM exception if it’s memory-related. You haven’t told us what your 
physical RAM is nor how much you’re allocating to heap, those would be helpful.

As far as Solr not answering, It Depends (tm). How are you querying Solr? If 
it’s just using an HTTP request to the node that died, there’s no communication 
possible, the http end-point is down. If you’re using SolrJ or load balancer in 
front, then it should, indeed get to the live Solr and you should get a reply.

I’ll add that from what you report, this system seems massivley over-sharded. I 
generally start my testing with the assumption that I can fit 50,000,000 
documents per shard on a decent-sized box. So unless this configuration is for 
massive planned growth, the number of shards you have is far in excess of what 
you need. This isn’t the root cause of your problem, but it doesn’t help 
either….

Best,
Erick

> On Aug 12, 2019, at 7:47 AM, Kojo  wrote:
> 
> Hi,
> I am using Solr cloud on this configuration:
> 
> 2 boxes (one Solr in each box)
> 4 instances per box
> 
> At this moment I have an active collections with about 300.000 docs. The
> other collections are not being queried. The acctive collection is
> configured:
> - shards: 16
> - replication factor: 2
> 
> These two Solrs (Solr1 and Solr2) use Zookeper (one box, one instance. No
> zookeeper cluster)
> 
> My application point to Solr1, and everything works fine, until suddenly on
> instance of this Solr1 dies. This istance is on port 8983, the "main"
> instance. I thought it could be related to memory usage, but we increase
> RAM and JVM memory but it still dies.
> The Solr1, the one wich dies,is the destination where I point my web
> application.
> 
> Here I have two questions that I hope you can help me:
> 
> 1. Which log can I look for debug this issue?
> 2. After this instance dies, the Solr cloud does not answer to my web
> application. Is this correct? I thougth that the replicas should answer if
> one shard, instance or one box goes down.
> 
> Regards,
> Koji



Re: solr: java.nio.file.accessdeniedexception

2019-08-12 Thread Shawn Heisey

On 8/12/2019 6:44 AM, Rathor, Piyush wrote:

We are facing following issue in data update on solr: 
java.nio.file.accessdeniedexception in solr cloud


https://cwiki.apache.org/confluence/display/solr/UsingMailingLists

We will need considerably more detail.  Exceptions from Java are MANY 
lines long -- there will be much more than just the name of the 
exception.  There may be one or more sections that start with "Caused 
by" as well, and we will need those as well if they exist.  Also, all of 
it will me mixed case, and without seeing which letters are uppercase, 
it may be difficult to understand it.  The tiny bit of info shared above 
has been changed to all lowercase.


AccessDeniedException typically means that the program is running with 
permissions that do not allow it to perform whatever it is trying to do. 
 Usually the additional detail in the error message will describe what 
was being accessed when access was denied, and will indicate where in 
the Solr code the problem happened.  We will also need the exact Solr 
version you're running in order to analyze that information.


Thanks,
Shawn


RE: Solr mailing list

2019-08-12 Thread Rathor, Piyush
Hi Margo

I am not trying to unsubscribe but we are facing following issue in data update 
on solr: java.nio.file.accessdeniedexception in solr cloud

Thanks & Regards
Piyush

-Original Message-
From: Jan Høydahl 
Sent: Monday, August 12, 2019 3:23 AM
To: margo.br...@indi.nl
Cc: solr-user 
Subject: [EXT] Re: Solr mailing list

You may want to send an email to solr-user-unsubscr...@lucene.apache.org 
instead, that should get you off the list :)

--
Jan Høydahl, search solution architect
Cominvent AS - 
http://secure-web.cisco.com/13omb9jGcM1YF6eodW3rC0zeFoj5Sav00ggofDNTLn4JL1Cm9UkyRhFOUFYiWN6cSNpoi6GDpa4Vfml2VT4wIj4w1yUu8T-TK8ZGIB9VfekbysgNeAEDC2MdMGGEzLF9Uo0AOnOZ7FnX1QvwL79qOo-_bFx61zHc_yKIeBKdeF8n7MJ6S7x7B4POnaaHoH9uAjp8yCnYa7d2P1sRUKgwsQkl7iBoKFtpNtfgWkv7M9jfVZ-Ji2ex00m2go0PxWpffIfyo9dtpYjm5gYPsx61c9XY8--_mgxo6-5T_9odneZGrPtKmkNc0BHLB0uN2n9ic9Nk_y02qpArGmOhvn2ngFPuNJUEO5pSNTEz_N3TE-CO6RICzxwWSXNfh8GJwBhPpjOGt5-c-W77bObachCq_UZrysrxtHSfvu7X95_2Upzz3zZ2ypCbzsM2uyxa8lql9sk9CIEmV44SEC1A7rcbW_bRacVkqSfBqmEdOjoSyTPm7XougErSV4NBvLZGDvO6Z97tWbnqF-mCi30hzU5LXyQ/http%3A%2F%2Fwww.cominvent.com

> 12. aug. 2019 kl. 08:46 skrev Margo Breäs | INDI :
>
> Unsubscribe

This message (including any attachments) contains confidential information 
intended for a specific individual and purpose, and is protected by law. If you 
are not the intended recipient, you should delete this message and any 
disclosure, copying, or distribution of this message, or the taking of any 
action based on it, by you is strictly prohibited.

Deloitte refers to a Deloitte member firm, one of its related entities, or 
Deloitte Touche Tohmatsu Limited ("DTTL"). Each Deloitte member firm is a 
separate legal entity and a member of DTTL. DTTL does not provide services to 
clients. Please see www.deloitte.com/about to learn more.

v.E.1


solr: java.nio.file.accessdeniedexception

2019-08-12 Thread Rathor, Piyush
Hi Team

We are facing following issue in data update on solr: 
java.nio.file.accessdeniedexception in solr cloud

Thanks & Regards
Piyush

This message (including any attachments) contains confidential information 
intended for a specific individual and purpose, and is protected by law. If you 
are not the intended recipient, you should delete this message and any 
disclosure, copying, or distribution of this message, or the taking of any 
action based on it, by you is strictly prohibited.

Deloitte refers to a Deloitte member firm, one of its related entities, or 
Deloitte Touche Tohmatsu Limited ("DTTL"). Each Deloitte member firm is a 
separate legal entity and a member of DTTL. DTTL does not provide services to 
clients. Please see www.deloitte.com/about to learn more.

v.E.1


Solr cloud questions

2019-08-12 Thread Kojo
Hi,
I am using Solr cloud on this configuration:

2 boxes (one Solr in each box)
4 instances per box

At this moment I have an active collections with about 300.000 docs. The
other collections are not being queried. The acctive collection is
configured:
- shards: 16
- replication factor: 2

These two Solrs (Solr1 and Solr2) use Zookeper (one box, one instance. No
zookeeper cluster)

My application point to Solr1, and everything works fine, until suddenly on
instance of this Solr1 dies. This istance is on port 8983, the "main"
instance. I thought it could be related to memory usage, but we increase
RAM and JVM memory but it still dies.
The Solr1, the one wich dies,is the destination where I point my web
application.

Here I have two questions that I hope you can help me:

1. Which log can I look for debug this issue?
2. After this instance dies, the Solr cloud does not answer to my web
application. Is this correct? I thougth that the replicas should answer if
one shard, instance or one box goes down.

Regards,
Koji


Re: Solr is very slow with term vectors

2019-08-12 Thread Vignan Malyala
Hi Doug / Walter,

I'm just using this methodology.
PFB link of my sample code.
https://github.com/saaay71/solr-vector-scoring

The only issue is speed of response for 1M records.

On Mon, Aug 12, 2019 at 12:24 AM Walter Underwood 
wrote:

> tf.idf was invented because cosine similarity is too much computation.
> tf.idf gives similar results much, much faster than cosine distance.
>
> I would expect cosine similarity to be slow. I would also expect
> retrieving 1 million records to be slow. Doing both of those in one minute
> is pretty good.
>
> As Kernighan and Paugher said in 1978, "Don’t diddle code to make it
> faster—find a better algorithm.”
>
> https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Aug 11, 2019, at 10:40 AM, Doug Turnbull <
> dturnb...@opensourceconnections.com> wrote:
> >
> > Hi Vignan,
> >
> > We need to see more details / code of what your query parser plugin does
> > exactly with term vectors, we can't really help you without more details.
> > Is it open source? Can you share a minimal example that recreates the
> > problem?
> >
> > On Sun, Aug 11, 2019 at 1:19 PM Vignan Malyala 
> wrote:
> >
> >> Hi guys,
> >>
> >> I made my custom qparser plugin in Solr for scoring. The plugin only
> does
> >> cosine similarity of vectors for each record. I use term vectors here.
> >> Results are fine!
> >>
> >> BUT, Solr response is very slow with term vectors. It takes around 55
> >> seconds for each request for 100 records.
> >> How do I make it faster to get my results in ms ?
> >> Please respond soon as its lil urgent.
> >>
> >> Note: All my values are stored and indexed. I am not using Solr Cloud.
> >>
> >
> >
> > --
> > *Doug Turnbull **| CTO* | OpenSource Connections
> > , LLC | 240.476.9983
> > Author: Relevant Search 
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless
> > of whether attachments are marked as such.
>
>


Re: Contact for Wiki / Support page maintainer

2019-08-12 Thread Jan Høydahl
Update:
I had to add Jaroslaw with individual Space permission. 
Giving someone space permissions means they can edit all pages in the space 
unless they have Restrictions in pace. Guess that is a good solution, so anyone 
who wants to help out with the Wiki can get access, and should we need 
restricted pages for Committers or PMC that is easy to set up.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 31. jul. 2019 kl. 14:30 skrev Jan Høydahl :
> 
> I tried to add Jaroslaw as an editor of that one page by adding him under 
> "Restrictions" tab of the page. But it does not work.
> Anyone with higher Confluence skills who can tell how to give the edit bit 
> for a single page to individuals. I know how to add edit permission for the 
> whole WIKI space to individuals but that was not what I intended.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com 
> 
>> 29. jul. 2019 kl. 23:01 skrev Jan Høydahl > >:
>> 
>> All PMC members can add indivitual contributors in Confluence. Even for 
>> specific pages I think.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com 
>> 
>>> 29. jul. 2019 kl. 16:47 skrev Jason Gerlowski >> >:
>>> 
>>> I was under the impression that non-committers could also edit the
>>> wiki pages if the requested the appropriate karma on the mailing list.
>>> 
>>> Though maybe that changed with the move to cwiki, or maybe that's
>>> never been the case
>>> 
>>> On Thu, Jul 25, 2019 at 4:10 PM Jan Høydahl >> > wrote:
 
 All committers can edit. What would you like to change/add?
 
 Jan Høydahl
 
> 25. jul. 2019 kl. 09:11 skrev Jaroslaw Rozanski  >:
> 
> Hi folks!
> 
> Who is the maintainer of Solr Support page in the Apache Solr Wiki 
> (https://cwiki.apache.org/confluence/display/solr/Support 
> )?
> 
> Thanks,
> Jaroslaw
> 
> --
> Jaroslaw Rozanski | m...@jarekrozanski.eu 
>> 
> 



Re: Solr mailing list

2019-08-12 Thread Jan Høydahl
You may want to send an email to solr-user-unsubscr...@lucene.apache.org 
instead, that should get you off the list :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 12. aug. 2019 kl. 08:46 skrev Margo Breäs | INDI :
> 
> Unsubscribe