Re: Indexed=false for a field,but still able to search on field.

2017-08-28 Thread Renuka Srishti
Hii,

I have tried two scanarios:

   1. I have tried   and docValues is not set anything.
   2. I have tried   and docValues is set true.

#1. You can not search directly that field, but when you apply search in
any other field of that doc, it will show you that field in the result.

  You can not do faceting on this field as well.

   If you will apply seach on this field in the Solr Admin Panel, no
result found. But you can see this field on doc there.

#2. Its searchable and can do faceting also.


Please correct me, if I am going wrong.


Thanks

Renuka Srishti



On Tue, Aug 29, 2017 at 1:06 AM, AshB  wrote:

> Hi,
>
> Yes docValues is true for fieldType
>
>  docValues="true"/>
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Indexed-false-for-a-field-but-still-able-to-search-on-field-
> tp4352338p4352442.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Solr client

2017-08-28 Thread Aditya
Hi

I am aggregating open source solr client libraries across all languages.
Below are the links. Very few projects are currently active. Most of them
are last updated few years back. Please provide me pointers, if i missed
any solr client library.

http://www.findbestopensource.com/tagged/solr-client
http://www.findbestopensource.com/tagged/solr-gui


Regards
Ganesh

PS: The website http://www.findbestopensource.com search is powered by Solr.


solr index replace with index from another environment

2017-08-28 Thread Satya Marivada
Hi there,

We are using solr-6.3.0 and have the need to replace the solr index in
production with the solr index from another environment on periodical
basis. But the jvms have to be recycled for the updated index to take
effect. Is there any way this can be achieved without restarting the jvms?

Using aliases as described below, there is an alternative, but I dont think
it is useful in my case, where I have the index from other environment
ready. If I build new collection and replace index, again, the jvms need to
be restarted for the new index to take effect.

https://stackoverflow.com/questions/45158394/replacing-old-indexed-data-with-new-data-in-apache-solr-with-zero-downtime

Any other suggestions please.

Thanks,
satya


Re: Excessive resources consumption migrating from Solr 6.6.0 Master/Slave to SolrCloud 6.6.0 (dozen times more resources)

2017-08-28 Thread Scott Stults
Dani,

It might be time to attach some instrumentation to one of your nodes.
Finding out which classes are occupying the memory will help narrow the
issue.

Are you using a lot of facets, grouping, or stats during your queries?
Also, when you were doing Master/Slave, was that on the same version of
Solr as you're using now in SolrCloud mode?


-Scott

On Mon, Aug 28, 2017 at 4:50 AM, Daniel Ortega 
wrote:

> Hi Scott,
>
> Yes, we think that our usage scenario falls into Index-Heavy/Query-Heavy
> too. We have tested with several values in softcommit/hardcommit values
> (from few seconds to minutes) with no appreciable improvements :(
>
> Thanks for your reply!
>
> - Daniel
>
> 2017-08-25 6:45 GMT+02:00 Scott Stults  >:
>
> > Hi Dani,
> >
> > It seems like your use case falls into the Index-Heavy / Query-Heavy
> > category, so you might try increasing your hard commit frequency to 15
> > seconds rather than 15 minutes:
> >
> > https://lucidworks.com/2013/08/23/understanding-
> > transaction-logs-softcommit-and-commit-in-sorlcloud/
> >
> >
> > -Scott
> >
> > On Thu, Aug 24, 2017 at 10:03 AM, Daniel Ortega <
> > danielortegauf...@gmail.com
> > > wrote:
> >
> > > Hi Scott,
> > >
> > > In our indexing service we are using that client too
> > > (org.apache.solr.client.solrj.impl.CloudSolrClient) :)
> > >
> > > This is out Update Request Processor chain configuration:
> > >
> > >  > > name
> > > ="signature"> true  > name="signatureField">
> > > hash false  > > "signatureClass">solr.processor.Lookup3Signature
> > 
> > > <
> > > updateRequestProcessorChain processor="signature" name="dedupe">
> >  > > class="solr.LogUpdateProcessorFactory" />  > > "solr.RunUpdateProcessorFactory" /> 
>  <
> > > requestHandler name="/update" class="solr.UpdateRequestHandler" >  > > name=
> > > "defaults"> dedupe 
> > 
> > >
> > > Thanks for your reply :)
> > >
> > > - Dani
> > >
> > > 2017-08-24 14:49 GMT+02:00 Scott Stults  > opensourceconnections.com
> > > >:
> > >
> > > > Hi Daniel,
> > > >
> > > > SolrJ has a few client implementations to choose from:
> CloudSolrClient,
> > > > ConcurrentUpdateSolrClient, HttpSolrClient, LBHttpSolrClient. You
> said
> > > your
> > > > query service uses CloudSolrClient, but it would be good to verify
> > which
> > > > implementation your indexing service uses.
> > > >
> > > > One of the problems you might be having is with your deduplication
> > step.
> > > > Can you post your Update Request Processor Chain?
> > > >
> > > >
> > > > -Scott
> > > >
> > > >
> > > > On Wed, Aug 23, 2017 at 4:13 PM, Daniel Ortega <
> > > > danielortegauf...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Scott,
> > > > >
> > > > > - *Can you describe the process that queries the DB and sends
> records
> > > to
> > > > *
> > > > > *Solr?*
> > > > >
> > > > > We are enqueueing ids during every ORACLE transaction (in
> > > > insert/updates).
> > > > >
> > > > > An application dequeues every id and perform queries against dozen
> of
> > > > > tables in the relational model to retrieve the fields to build the
> > > > > document.  As we know that we are modifying the same ORACLE row in
> > > > > different (but consecutive) transactions, we store only the last
> > > version
> > > > of
> > > > > the modified documents in a map data structure.
> > > > >
> > > > > The application has a configurable interval to send the documents
> > > stored
> > > > in
> > > > > the map to the update handler (we have tested different intervals
> > from
> > > > few
> > > > > milliseconds to several seconds) using the SolrJ client. Actually
> we
> > > are
> > > > > sending all the documents every 15 seconds.
> > > > >
> > > > > This application is developed using Java, Spring and Maven and we
> > have
> > > > > several instances.
> > > > >
> > > > > -* Is it a SolrJ-based application?*
> > > > >
> > > > > Yes, it is. We aren't using the last version of SolrJ client (we
> are
> > > > > currently using SolrJ v6.3.0).
> > > > >
> > > > > - *If it is, which client package are you using?*
> > > > >
> > > > > I don't know exactly what do you mean saying 'client package' :)
> > > > >
> > > > > - *How many documents do you send at once?*
> > > > >
> > > > > It depends on the defined interval described before and the number
> of
> > > > > transactions executed in our relational database. From dozens to
> few
> > > > > hundreds (and even thousands).
> > > > >
> > > > > - *Are you sending your indexing or query traffic through a load
> > > > balancer?*
> > > > >
> > > > > We aren't using a load balancer for indexing, but we have all our
> > Rest
> > > > > Query services through an HAProxy (using 'leastconn' algorithm).
> The
> > > Rest
> > > > > Query Services performs queries using the CloudSolrClient.
> > > > >
> > > > > Thanks for your reply,
> > > > > if you need any further information don't hesitate to ask
> > > > >
> > > > > Daniel
> > > > >
> > > > > 2017-08-23 14:57 GMT+02:00 Scott Stults  > > > opensourceconnections.com
> > > > > >:
> > > > >
> > 

Re: Process to fix typos in ref-guide

2017-08-28 Thread Leonardo Perez Pulido
Hi,
I think the PR is the easiest and better. They are only typos in the
ref-guide.
Thanks!

On Mon, Aug 28, 2017 at 4:45 PM, Erick Erickson 
wrote:

> If you're a committer yes. If not, I guess you'd have to create a
> patch or a PR and ask a committer pick it up.
>
> And probably not only master but 7x as well.
>
> Erick
>
> On Mon, Aug 28, 2017 at 1:32 PM, Leonardo Perez Pulido
>  wrote:
> > Hi,
> > How is the process to help fix typos in Solr's ref. guide? Can I merge it
> > directly to master?
> > @Cassandra.
> > Thanks.
>


Re: Process to fix typos in ref-guide

2017-08-28 Thread Erick Erickson
If you're a committer yes. If not, I guess you'd have to create a
patch or a PR and ask a committer pick it up.

And probably not only master but 7x as well.

Erick

On Mon, Aug 28, 2017 at 1:32 PM, Leonardo Perez Pulido
 wrote:
> Hi,
> How is the process to help fix typos in Solr's ref. guide? Can I merge it
> directly to master?
> @Cassandra.
> Thanks.


Process to fix typos in ref-guide

2017-08-28 Thread Leonardo Perez Pulido
Hi,
How is the process to help fix typos in Solr's ref. guide? Can I merge it
directly to master?
@Cassandra.
Thanks.


Re: Solr cloud in kubernetes

2017-08-28 Thread Lars Karlsson
Thanks Björn for the detailed information, just wanted to understand:

When you say separate service for external traffic, does this mean a home
brewed one that proxy solr queries?

And what is the difference between the above and "solr discovery"?

Do you specify pod anti affinity for solr hosts?

Regards
Lars

On Sat, 26 Aug 2017 at 13:19, Björn Häuser  wrote:

> Hi Lars,
>
> we are running Solr in kubernetes and after some initial problems we are
> running quite stable now.
>
> Here is the setup we choose for solr:
>
> - separate service for external traffic to solr (called “solr”)
> - statefulset for solr with 3 replicas with another service (called
> “solr-discovery”)
>
> We set the SOLR_HOST (which is used for intra cluster communication) to
> the pod inside the statefulset
> (solr-0.solr-discovery.default.svc.cluster.local. This ensures that on solr
> pod restart the intra cluster communication still continues to work. In the
> beginning we used the IP address of the pod, this caused problems when
> restarting pods, they tried to talk with the old ip addresses.
>
> Zookeeper inside kubernetes is a different story. Use the latest version
> of kubernetes, because old versions never reresolved dns names. For
> connecting to zookeeper we use the same approach, one service-ip for all
> pods. The statefulset works again with a different service name.
>
> The problems we are currently facing:
>
> - Client timeouts whenever a solr pod stops and starts again, we currently
> try to solve this with better readiness probes, no success yet
> - Sometimes solr collections do not recover completely after a pod restart
> and we manually have to force recovery, still not investigated fully
>
> Hope this helps you!
>
> Thanks
> Björn
>
> > On 26. Aug 2017, at 12:08, Lars Karlsson 
> wrote:
> >
> > Hi, I wanted to hear if anyone successfully got solr cloud running on
> > kubernetes and can share challenges and limitations.
> >
> > Can't find much uptodate github projects, would be great if you can point
> > out blogposts or other useful links.
> >
> > Thanks in advance.
>
>


Re: Indexed=false for a field,but still able to search on field.

2017-08-28 Thread AshB
Hi,

Yes docValues is true for fieldType







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexed-false-for-a-field-but-still-able-to-search-on-field-tp4352338p4352442.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet on a Payload field type?

2017-08-28 Thread Webster Homer
The issue is, that we lack translations for much of our attribute data. We
do have English versions. The idea is to use the English values for the
faceted values and for the filters, but be able to retrieve different
language versions of the term to the caller.
If we have a facet on color if the value is red, be able to retrieve rojo
for Spanish etc...

Also users can switch regions between searches. If a user starts out in
French, executes a search, selects a facet then switches to German they
should get the German for the facet (if it exists) even when they
originally used French. If all of the searching was in English where we
have the data, we could then show French (or German etc) for the facet
value.

The real field value that we use for filtering would be in English but the
values returned to the user would be in the language of their locale or
English if we don't have a translation for it. The idea being that the
translations would be stored in the payloads

On Wed, Aug 23, 2017 at 7:47 PM, Chris Hostetter 
wrote:

>
> : The payload idea was from my boss, it's similar to how they did this in
> : Endeca.
> ...
> : My alternate idea is to have sets of facet fields for different
> languages,
> : then let our service layer determine the correct one for the user's
> : language, but I'm curious as to how others have solved this.
>
> Let's back up for a minute -- can you please explain your ultimate goal,
> from a "solr client application" perspective? (assuming we have no
> knowledge of how/how you might have used Endeca in the past)
>
> What is it you want your application to be able to do when indexing docs
> to solr and making queries to solr?  give us some real world examples
>
>
>
> (If i had to guess: i gather maybe you're just dealing with a "keywords"
> type field that you want to facet on -- and maybe you could use a diff
> field for each langauge, or encode the langauges as a prefix on each term
> and use facet.prefix to restrict the facet contraints returned)
>
>
>
> https://people.apache.org/~hossman/#xyproblem
> XY Problem
>
> Your question appears to be an "XY Problem" ... that is: you are dealing
> with "X", you are assuming "Y" will help you, and you are asking about "Y"
> without giving more details about the "X" so that we can understand the
> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> See Also: http://www.perlmonks.org/index.pl?node_id=542341
>
>
>
> :
> : On Wed, Aug 23, 2017 at 2:10 PM, Markus Jelsma <
> markus.jel...@openindex.io>
> : wrote:
> :
> : > Technically they could, facetting is possible on TextField, but it
> would
> : > be useless for facetting. Payloads are only used for scoring via a
> custom
> : > Similarity. Payloads also can only contain one byte of information (or
> was
> : > it 64 bits?)
> : >
> : > Payloads are not something you want to use when dealing with
> translations.
> : > We handle facet constraint (and facet field)  translations just by
> mapping
> : > internal value to a translated value when displaying facet, and vice
> versa
> : > when filtering.
> : >
> : > -Original message-
> : > > From:Webster Homer 
> : > > Sent: Wednesday 23rd August 2017 20:22
> : > > To: solr-user@lucene.apache.org
> : > > Subject: Facet on a Payload field type?
> : > >
> : > > Is it possible to facet on  a payload field type?
> : > >
> : > > We are moving from Endeca to Solr. We have a number of Endeca facets
> : > where
> : > > we have hacked in multilangauge support. The multiple languages are
> : > really
> : > > just for displaying the value of a term internally the value used to
> : > search
> : > > is in English. The problem is that we don't have translations for
> most of
> : > > our facet data and this was a way to support multiple languages with
> the
> : > > data we have.
> : > >
> : > > Looking at the Solrj FacetField class I cannot tell if the value can
> : > >  contain  a payload or not
> : > >
> : > > --
> : > >
> : > >
> : > > This message and any attachment are confidential and may be
> privileged or
> : > > otherwise protected from disclosure. If you are not the intended
> : > recipient,
> : > > you must not copy this message or attachment or disclose the
> contents to
> : > > any other person. If you have received this transmission in error,
> please
> : > > notify the sender immediately and delete the message and any
> attachment
> : > > from your system. Merck KGaA, Darmstadt, Germany and any of its
> : > > subsidiaries do not accept liability for any omissions or errors in
> this
> : > > message which may arise as a result of E-Mail-transmission or for
> damages
> : > > resulting from any unauthorized changes of the content of this
> message
> : > and
> : > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> : > > subsidiaries do not guarantee that this message is free of viruses
> and
> : > does
> : > > not accept liability for any damages caused by any virus transmitted
> : > > therewith.
> 

Re: Solr Wiki issues

2017-08-28 Thread Cassandra Targett
This appears to have happened for at least one other Apache project
using Apache's Confluence installation:
https://issues.apache.org/jira/browse/INFRA-14971.

You should use the new Ref Guide anyway:
https://lucene.apache.org/solr/guide/post-tool.html. An automatic
redirect from the old location is in the works.

On Mon, Aug 28, 2017 at 11:32 AM, Erick Erickson
 wrote:
> Hmmm, no it's not just you, I see them too.
>
>
> On Mon, Aug 28, 2017 at 7:45 AM, Steve Pruitt  wrote:
>> Is it just me, but the Solr Wiki shows nonsensical characters for what looks 
>> like example commands, etc.?   I tried both Chrome and IE and get the same 
>> result.
>>
>> Example, on https://cwiki.apache.org/confluence/display/solr/Post+Tool
>>
>> This shows:
>>
>> Index a PDF file into gettingstarted.
>> #66nonesolid
>>
>> Automatically detect content types in a folder, and recursively scan it for 
>> documents for indexing into gettingstarted.
>> #66nonesolid
>>
>> Automatically detect content types in a folder, but limit it to PPT and HTML 
>> files and index into gettingstarted.
>> #66nonesolid
>>
>> This started showing up a few days ago.
>>
>> Thanks.
>>
>> -S


Zookeeper issues

2017-08-28 Thread Mikhail Ibraheem
Hi,When trying to ingest data into solr, got a lot of zookeeper exceptions and 
the load fails :org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for /collections/WIP_DWO/state.json 


org.apache.solr.common.SolrException: Could not load collection from ZK: 
WIP_DWO at 
org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:1098)
 ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi 
- 2017-03-21 20:47:15] at 
org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(ZkStateReader.java:638)
 ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi 
- 2017-03-21 20:47:15] at 
org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1482)
 ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi 
- 2017-03-21 20:47:15] at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1092)
 ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi 
- 2017-03-21 20:47:15] at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1057)
 ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi 
- 2017-03-21 20:47:15] at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:160) 
~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi 
- 2017-03-21 20:47:15] at 
org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:484) 
~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi 
- 2017-03-21 20:47:15] at 
org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:448) 
~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi 
- 2017-03-21 20:47:15] at 
oracle.ecc.index.retrieve.solr.util.QueryExecuter.commitThenOptimize(QueryExecuter.java:143)
 ~[ecc-ir-1.0-SNAPSHOT.jar:na] at 
oracle.ecc.index.tools.IndexDataHelper.commit(IndexDataHelper.java:73) 
~[ecc-ir-1.0-SNAPSHOT.jar:na] at 
oracle.ecc.index.tools.DataLoadUtil.loadDataForDataset(DataLoadUtil.java:286) 
~[ecc-ir-1.0-SNAPSHOT.jar:na] at 
oracle.ecc.index.retrieve.services.impl.IRDataLoadServiceImpl.lambda$runProcessorsForJobSync$1(IRDataLoadServiceImpl.java:151)
 [ecc-ir-1.0-SNAPSHOT.jar:na] at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
 ~[na:1.8.0_73] at 
java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582)
 ~[na:1.8.0_73] at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) ~[na:1.8.0_73] 
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) 
~[na:1.8.0_73] at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) 
~[na:1.8.0_73] at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) 
~[na:1.8.0_73]Caused by: 
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /collections/WIP_DWO/state.json at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:127) 
~[zookeeper-3.4.6.jar:3.4.6-1569965] at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:51) 
~[zookeeper-3.4.6.jar:3.4.6-1569965] at 
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) 
~[zookeeper-3.4.6.jar:3.4.6-1569965] at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356) 
~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi 
- 2017-03-21 20:47:15] at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353) 
~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi 
- 2017-03-21 20:47:15] at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
 ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi 
- 2017-03-21 20:47:15] at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353) 
~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi 
- 2017-03-21 20:47:15] at 
org.apache.solr.common.cloud.ZkStateReader.fetchCollectionState(ZkStateReader.java:1110)
 ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi 
- 2017-03-21 20:47:15] at 
org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:1096)
 ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi 
- 2017-03-21 20:47:15] ... 17 common frames omitted

Re: Indexed=false for a field,but still able to search on field.

2017-08-28 Thread Erick Erickson
Is docValues enabled (this happens by default in some versions)? I
think I've seen this enable searching on a field.

If that's the root of the problem, let us know since it's trappy and
we should discuss this on the dev list.

Best,
Erick

On Sun, Aug 27, 2017 at 10:58 PM, AshB  wrote:
> Hi,
>
> I created a field as,expecting I won't be able to search on it
>
> .
>
> But i am able to search on it.Sample query below
>
> fileName:"ipgb20080916_1078.xml"
>
> What is wrong here.I am not doing any copy of this field
>
> Solrversion:6.5.1
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexed-false-for-a-field-but-still-able-to-search-on-field-tp4352338.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search by similarity?

2017-08-28 Thread Erick Erickson
What are the results of adding &debug=query to the URL? The parsed
query will be especially illuminating.

Best,
Erick

On Mon, Aug 28, 2017 at 4:37 AM, Emir Arnautovic
 wrote:
> Hi Darko,
>
> The issue is the wrong expectations: title-1-end is parsed to 3 tokens
> (guessing) and mm=99% of 3 tokens is 2.99 and it is rounded down to 2. Since
> all your documents have 'title' and 'end' tokens, all match. If you want to
> round up, you can use mm=-1% - that will result in zero (or one match if you
> do not filter out original document).
>
> You have to play with your tokenizers and define what is similarity match
> percentage (if you want to stick with mm).
>
> Regards,
> Emir
>
>
>
> On 28.08.2017 09:17, Darko Todoric wrote:
>>
>> Hm... I cannot make that this DisMax work on my Solr...
>>
>> In solr I have document with title:
>>  - "title-1-end"
>>  - "title-2-end"
>>  - "title-3-end"
>>  - ...
>>  - ...
>>  - "title-312-end"
>>
>> and when I make query
>> "*http://localhost:8983/solr/SciLit/select?defType=dismax&indent=on&mm=99%&q=title:"title-123123123-end"&wt=json*'
>> I get all documents from solr :\
>> What I doing wrong?
>>
>> Also, I don't know if affecting results, but on "title" field I use
>> "WhitespaceTokenizerFactory".
>>
>> Kind regards,
>> Darko
>>
>>
>> On 08/25/2017 06:38 PM, Junte Zhang wrote:
>>>
>>> If you already have the title of the document, then you could run that
>>> title as a new query against the whole index and exclude the source document
>>> from the results as a filter.
>>>
>>> You could use the DisMax query parser:
>>> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
>>>
>>> And then set the minimum match ratio of the OR clauses to 90%.
>>>
>>> /JZ
>>>
>>> -Original Message-
>>> From: Darko Todoric [mailto:todo...@mdpi.com]
>>> Sent: Friday, August 25, 2017 5:49 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Search by similarity?
>>>
>>> Hi,
>>>
>>>
>>> I have 90.000.000 documents in Solr and I need to compare "title" of this
>>> document and get all documents with more than 80% similarity. PHP have
>>> "similar_text" but it's not so smart inserting 90m documents in the array...
>>> Can I do some query in Solr which will give me the more the 80%
>>> similarity?
>>>
>>>
>>> Kind regards,
>>> Darko Todoric
>>>
>>> --
>>> Darko Todoric
>>> Web Engineer, MDPI DOO
>>> Veljka Dugosevica 54, 11060 Belgrade, Serbia
>>> +381 65 43 90 620
>>> www.mdpi.com
>>>
>>> Disclaimer: The information and files contained in this message are
>>> confidential and intended solely for the use of the individual or entity to
>>> whom they are addressed.
>>> f you have received this message in error, please notify me and delete
>>> this message from your system.
>>> You may not copy this message in its entirety or in part, or disclose its
>>> contents to anyone.
>>>
>>
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>


Re: Solr Wiki issues

2017-08-28 Thread Erick Erickson
Hmmm, no it's not just you, I see them too.


On Mon, Aug 28, 2017 at 7:45 AM, Steve Pruitt  wrote:
> Is it just me, but the Solr Wiki shows nonsensical characters for what looks 
> like example commands, etc.?   I tried both Chrome and IE and get the same 
> result.
>
> Example, on https://cwiki.apache.org/confluence/display/solr/Post+Tool
>
> This shows:
>
> Index a PDF file into gettingstarted.
> #66nonesolid
>
> Automatically detect content types in a folder, and recursively scan it for 
> documents for indexing into gettingstarted.
> #66nonesolid
>
> Automatically detect content types in a folder, but limit it to PPT and HTML 
> files and index into gettingstarted.
> #66nonesolid
>
> This started showing up a few days ago.
>
> Thanks.
>
> -S


Re: Solr memory leak

2017-08-28 Thread Erick Erickson
Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including it.

On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood  wrote:
> That would be a really good reason for a 6.7.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Aug 28, 2017, at 8:48 AM, Markus Jelsma  
>> wrote:
>>
>> It is, unfortunately, not committed for 6.7.
>>
>>
>>
>>
>>
>> -Original message-
>>> From:Markus Jelsma 
>>> Sent: Monday 28th August 2017 17:46
>>> To: solr-user@lucene.apache.org
>>> Subject: RE: Solr memory leak
>>>
>>> See https://issues.apache.org/jira/browse/SOLR-10506
>>> Fixed for 7.0
>>>
>>> Markus
>>>
>>>
>>>
>>> -Original message-
 From:Hendrik Haddorp 
 Sent: Monday 28th August 2017 17:42
 To: solr-user@lucene.apache.org
 Subject: Solr memory leak

 Hi,

 we noticed that triggering collection reloads on many collections has a
 good chance to result in an OOM-Error. To investigate that further I did
 a simple test:
 - Start solr with a 2GB heap and 1GB Metaspace
 - create a trivial collection with a few documents (I used only 2
 fields and 100 documents)
 - trigger a collection reload in a loop (I used SolrJ for this)

 Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6
 worked better but also failed after 1100 loops.

 When looking at the memory usage on the Solr dashboard it looks like the
 space left after GC cycles gets less and less. Then Solr gets very slow,
 as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In
 my last run this was actually for the Metaspace. So it looks like more
 and more heap and metaspace is being used by just constantly reloading a
 trivial collection.

 regards,
 Hendrik

>>>
>


Re: Solr memory leak

2017-08-28 Thread Walter Underwood
That would be a really good reason for a 6.7.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 28, 2017, at 8:48 AM, Markus Jelsma  wrote:
> 
> It is, unfortunately, not committed for 6.7.
> 
> 
> 
> 
> 
> -Original message-
>> From:Markus Jelsma 
>> Sent: Monday 28th August 2017 17:46
>> To: solr-user@lucene.apache.org
>> Subject: RE: Solr memory leak
>> 
>> See https://issues.apache.org/jira/browse/SOLR-10506
>> Fixed for 7.0
>> 
>> Markus
>> 
>> 
>> 
>> -Original message-
>>> From:Hendrik Haddorp 
>>> Sent: Monday 28th August 2017 17:42
>>> To: solr-user@lucene.apache.org
>>> Subject: Solr memory leak
>>> 
>>> Hi,
>>> 
>>> we noticed that triggering collection reloads on many collections has a 
>>> good chance to result in an OOM-Error. To investigate that further I did 
>>> a simple test:
>>> - Start solr with a 2GB heap and 1GB Metaspace
>>> - create a trivial collection with a few documents (I used only 2 
>>> fields and 100 documents)
>>> - trigger a collection reload in a loop (I used SolrJ for this)
>>> 
>>> Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 
>>> worked better but also failed after 1100 loops.
>>> 
>>> When looking at the memory usage on the Solr dashboard it looks like the 
>>> space left after GC cycles gets less and less. Then Solr gets very slow, 
>>> as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In 
>>> my last run this was actually for the Metaspace. So it looks like more 
>>> and more heap and metaspace is being used by just constantly reloading a 
>>> trivial collection.
>>> 
>>> regards,
>>> Hendrik
>>> 
>> 



RE: Solr memory leak

2017-08-28 Thread Markus Jelsma
It is, unfortunately, not committed for 6.7.



 
 
-Original message-
> From:Markus Jelsma 
> Sent: Monday 28th August 2017 17:46
> To: solr-user@lucene.apache.org
> Subject: RE: Solr memory leak
> 
> See https://issues.apache.org/jira/browse/SOLR-10506
> Fixed for 7.0
> 
> Markus
> 
>  
>  
> -Original message-
> > From:Hendrik Haddorp 
> > Sent: Monday 28th August 2017 17:42
> > To: solr-user@lucene.apache.org
> > Subject: Solr memory leak
> > 
> > Hi,
> > 
> > we noticed that triggering collection reloads on many collections has a 
> > good chance to result in an OOM-Error. To investigate that further I did 
> > a simple test:
> >  - Start solr with a 2GB heap and 1GB Metaspace
> >  - create a trivial collection with a few documents (I used only 2 
> > fields and 100 documents)
> >  - trigger a collection reload in a loop (I used SolrJ for this)
> > 
> > Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 
> > worked better but also failed after 1100 loops.
> > 
> > When looking at the memory usage on the Solr dashboard it looks like the 
> > space left after GC cycles gets less and less. Then Solr gets very slow, 
> > as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In 
> > my last run this was actually for the Metaspace. So it looks like more 
> > and more heap and metaspace is being used by just constantly reloading a 
> > trivial collection.
> > 
> > regards,
> > Hendrik
> > 
> 


RE: Solr memory leak

2017-08-28 Thread Markus Jelsma
See https://issues.apache.org/jira/browse/SOLR-10506
Fixed for 7.0

Markus

 
 
-Original message-
> From:Hendrik Haddorp 
> Sent: Monday 28th August 2017 17:42
> To: solr-user@lucene.apache.org
> Subject: Solr memory leak
> 
> Hi,
> 
> we noticed that triggering collection reloads on many collections has a 
> good chance to result in an OOM-Error. To investigate that further I did 
> a simple test:
>  - Start solr with a 2GB heap and 1GB Metaspace
>  - create a trivial collection with a few documents (I used only 2 
> fields and 100 documents)
>  - trigger a collection reload in a loop (I used SolrJ for this)
> 
> Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 
> worked better but also failed after 1100 loops.
> 
> When looking at the memory usage on the Solr dashboard it looks like the 
> space left after GC cycles gets less and less. Then Solr gets very slow, 
> as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In 
> my last run this was actually for the Metaspace. So it looks like more 
> and more heap and metaspace is being used by just constantly reloading a 
> trivial collection.
> 
> regards,
> Hendrik
> 


Solr memory leak

2017-08-28 Thread Hendrik Haddorp

Hi,

we noticed that triggering collection reloads on many collections has a 
good chance to result in an OOM-Error. To investigate that further I did 
a simple test:

- Start solr with a 2GB heap and 1GB Metaspace
- create a trivial collection with a few documents (I used only 2 
fields and 100 documents)

- trigger a collection reload in a loop (I used SolrJ for this)

Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 
worked better but also failed after 1100 loops.


When looking at the memory usage on the Solr dashboard it looks like the 
space left after GC cycles gets less and less. Then Solr gets very slow, 
as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In 
my last run this was actually for the Metaspace. So it looks like more 
and more heap and metaspace is being used by just constantly reloading a 
trivial collection.


regards,
Hendrik


Solr Wiki issues

2017-08-28 Thread Steve Pruitt
Is it just me, but the Solr Wiki shows nonsensical characters for what looks 
like example commands, etc.?   I tried both Chrome and IE and get the same 
result.

Example, on https://cwiki.apache.org/confluence/display/solr/Post+Tool

This shows:

Index a PDF file into gettingstarted.
#66nonesolid

Automatically detect content types in a folder, and recursively scan it for 
documents for indexing into gettingstarted.
#66nonesolid

Automatically detect content types in a folder, but limit it to PPT and HTML 
files and index into gettingstarted.
#66nonesolid

This started showing up a few days ago.

Thanks.

-S


Re: Index relational database

2017-08-28 Thread Susheel Kumar
Hello Renuka,

I would suggest to start with your use case(s). May be start with your
first use case with the below questions

a) What is that you want to search (which fields like name, desc, city etc.)
b) What is that you want to show part of search result (name, city etc.)

Based on above two questions, you would know what data to pull in from
relational database and create solr schema and index the data.

You may first try to denormalize / flatten the structure so that you deal
with one collection/schema and query upon it.

HTH.

Thanks,
Susheel

On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti 
wrote:

> Hii,
>
> What is the best way to index relational database, and how it impacts on
> the performance?
>
> Thanks
> Renuka Srishti
>


Index relational database

2017-08-28 Thread Renuka Srishti
Hii,

What is the best way to index relational database, and how it impacts on
the performance?

Thanks
Renuka Srishti


Re: Search by similarity?

2017-08-28 Thread Emir Arnautovic

Hi Darko,

The issue is the wrong expectations: title-1-end is parsed to 3 tokens 
(guessing) and mm=99% of 3 tokens is 2.99 and it is rounded down to 2. 
Since all your documents have 'title' and 'end' tokens, all match. If 
you want to round up, you can use mm=-1% - that will result in zero (or 
one match if you do not filter out original document).


You have to play with your tokenizers and define what is similarity 
match percentage (if you want to stick with mm).


Regards,
Emir


On 28.08.2017 09:17, Darko Todoric wrote:

Hm... I cannot make that this DisMax work on my Solr...

In solr I have document with title:
 - "title-1-end"
 - "title-2-end"
 - "title-3-end"
 - ...
 - ...
 - "title-312-end"

and when I make query 
"*http://localhost:8983/solr/SciLit/select?defType=dismax&indent=on&mm=99%&q=title:"title-123123123-end"&wt=json*' 
I get all documents from solr :\

What I doing wrong?

Also, I don't know if affecting results, but on "title" field I use 
"WhitespaceTokenizerFactory".


Kind regards,
Darko


On 08/25/2017 06:38 PM, Junte Zhang wrote:
If you already have the title of the document, then you could run 
that title as a new query against the whole index and exclude the 
source document from the results as a filter.


You could use the DisMax query parser: 
https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser


And then set the minimum match ratio of the OR clauses to 90%.

/JZ

-Original Message-
From: Darko Todoric [mailto:todo...@mdpi.com]
Sent: Friday, August 25, 2017 5:49 PM
To: solr-user@lucene.apache.org
Subject: Search by similarity?

Hi,


I have 90.000.000 documents in Solr and I need to compare "title" of 
this document and get all documents with more than 80% similarity. 
PHP have "similar_text" but it's not so smart inserting 90m documents 
in the array...
Can I do some query in Solr which will give me the more the 80% 
similarity?



Kind regards,
Darko Todoric

--
Darko Todoric
Web Engineer, MDPI DOO
Veljka Dugosevica 54, 11060 Belgrade, Serbia
+381 65 43 90 620
www.mdpi.com

Disclaimer: The information and files contained in this message are 
confidential and intended solely for the use of the individual or 
entity to whom they are addressed.
f you have received this message in error, please notify me and 
delete this message from your system.
You may not copy this message in its entirety or in part, or disclose 
its contents to anyone.






--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Excessive resources consumption migrating from Solr 6.6.0 Master/Slave to SolrCloud 6.6.0 (dozen times more resources)

2017-08-28 Thread Daniel Ortega
Hi Scott,

Yes, we think that our usage scenario falls into Index-Heavy/Query-Heavy
too. We have tested with several values in softcommit/hardcommit values
(from few seconds to minutes) with no appreciable improvements :(

Thanks for your reply!

- Daniel

2017-08-25 6:45 GMT+02:00 Scott Stults :

> Hi Dani,
>
> It seems like your use case falls into the Index-Heavy / Query-Heavy
> category, so you might try increasing your hard commit frequency to 15
> seconds rather than 15 minutes:
>
> https://lucidworks.com/2013/08/23/understanding-
> transaction-logs-softcommit-and-commit-in-sorlcloud/
>
>
> -Scott
>
> On Thu, Aug 24, 2017 at 10:03 AM, Daniel Ortega <
> danielortegauf...@gmail.com
> > wrote:
>
> > Hi Scott,
> >
> > In our indexing service we are using that client too
> > (org.apache.solr.client.solrj.impl.CloudSolrClient) :)
> >
> > This is out Update Request Processor chain configuration:
> >
> >  > name
> > ="signature"> true  name="signatureField">
> > hash false  > "signatureClass">solr.processor.Lookup3Signature
> 
> > <
> > updateRequestProcessorChain processor="signature" name="dedupe">
>  > class="solr.LogUpdateProcessorFactory" />  > "solr.RunUpdateProcessorFactory" />   <
> > requestHandler name="/update" class="solr.UpdateRequestHandler" >  > name=
> > "defaults"> dedupe 
> 
> >
> > Thanks for your reply :)
> >
> > - Dani
> >
> > 2017-08-24 14:49 GMT+02:00 Scott Stults  opensourceconnections.com
> > >:
> >
> > > Hi Daniel,
> > >
> > > SolrJ has a few client implementations to choose from: CloudSolrClient,
> > > ConcurrentUpdateSolrClient, HttpSolrClient, LBHttpSolrClient. You said
> > your
> > > query service uses CloudSolrClient, but it would be good to verify
> which
> > > implementation your indexing service uses.
> > >
> > > One of the problems you might be having is with your deduplication
> step.
> > > Can you post your Update Request Processor Chain?
> > >
> > >
> > > -Scott
> > >
> > >
> > > On Wed, Aug 23, 2017 at 4:13 PM, Daniel Ortega <
> > > danielortegauf...@gmail.com>
> > > wrote:
> > >
> > > > Hi Scott,
> > > >
> > > > - *Can you describe the process that queries the DB and sends records
> > to
> > > *
> > > > *Solr?*
> > > >
> > > > We are enqueueing ids during every ORACLE transaction (in
> > > insert/updates).
> > > >
> > > > An application dequeues every id and perform queries against dozen of
> > > > tables in the relational model to retrieve the fields to build the
> > > > document.  As we know that we are modifying the same ORACLE row in
> > > > different (but consecutive) transactions, we store only the last
> > version
> > > of
> > > > the modified documents in a map data structure.
> > > >
> > > > The application has a configurable interval to send the documents
> > stored
> > > in
> > > > the map to the update handler (we have tested different intervals
> from
> > > few
> > > > milliseconds to several seconds) using the SolrJ client. Actually we
> > are
> > > > sending all the documents every 15 seconds.
> > > >
> > > > This application is developed using Java, Spring and Maven and we
> have
> > > > several instances.
> > > >
> > > > -* Is it a SolrJ-based application?*
> > > >
> > > > Yes, it is. We aren't using the last version of SolrJ client (we are
> > > > currently using SolrJ v6.3.0).
> > > >
> > > > - *If it is, which client package are you using?*
> > > >
> > > > I don't know exactly what do you mean saying 'client package' :)
> > > >
> > > > - *How many documents do you send at once?*
> > > >
> > > > It depends on the defined interval described before and the number of
> > > > transactions executed in our relational database. From dozens to few
> > > > hundreds (and even thousands).
> > > >
> > > > - *Are you sending your indexing or query traffic through a load
> > > balancer?*
> > > >
> > > > We aren't using a load balancer for indexing, but we have all our
> Rest
> > > > Query services through an HAProxy (using 'leastconn' algorithm). The
> > Rest
> > > > Query Services performs queries using the CloudSolrClient.
> > > >
> > > > Thanks for your reply,
> > > > if you need any further information don't hesitate to ask
> > > >
> > > > Daniel
> > > >
> > > > 2017-08-23 14:57 GMT+02:00 Scott Stults  > > opensourceconnections.com
> > > > >:
> > > >
> > > > > Hi Daniel,
> > > > >
> > > > > Great background information about your setup! I've got just a few
> > more
> > > > > questions:
> > > > >
> > > > > - Can you describe the process that queries the DB and sends
> records
> > to
> > > > > Solr?
> > > > > - Is it a SolrJ-based application?
> > > > > - If it is, which client package are you using?
> > > > > - How many documents do you send at once?
> > > > > - Are you sending your indexing or query traffic through a load
> > > balancer?
> > > > >
> > > > > If you're sending documents to each replica as fast as they can
> take
> > > > them,
> > > > > you might be seeing a bottleneck at the shard leaders. The SolrJ
> > > > > CloudSolrClient finds out from Zooke

Re: Search by similarity?

2017-08-28 Thread Darko Todoric

Hm... I cannot make that this DisMax work on my Solr...

In solr I have document with title:
 - "title-1-end"
 - "title-2-end"
 - "title-3-end"
 - ...
 - ...
 - "title-312-end"

and when I make query 
"*http://localhost:8983/solr/SciLit/select?defType=dismax&indent=on&mm=99%&q=title:"title-123123123-end"&wt=json*' 
I get all documents from solr :\

What I doing wrong?

Also, I don't know if affecting results, but on "title" field I use 
"WhitespaceTokenizerFactory".


Kind regards,
Darko


On 08/25/2017 06:38 PM, Junte Zhang wrote:

If you already have the title of the document, then you could run that title as 
a new query against the whole index and exclude the source document from the 
results as a filter.

You could use the DisMax query parser: 
https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser

And then set the minimum match ratio of the OR clauses to 90%.

/JZ

-Original Message-
From: Darko Todoric [mailto:todo...@mdpi.com]
Sent: Friday, August 25, 2017 5:49 PM
To: solr-user@lucene.apache.org
Subject: Search by similarity?

Hi,


I have 90.000.000 documents in Solr and I need to compare "title" of this document and 
get all documents with more than 80% similarity. PHP have "similar_text" but it's not so 
smart inserting 90m documents in the array...
Can I do some query in Solr which will give me the more the 80% similarity?


Kind regards,
Darko Todoric

--
Darko Todoric
Web Engineer, MDPI DOO
Veljka Dugosevica 54, 11060 Belgrade, Serbia
+381 65 43 90 620
www.mdpi.com

Disclaimer: The information and files contained in this message are 
confidential and intended solely for the use of the individual or entity to 
whom they are addressed.
f you have received this message in error, please notify me and delete this 
message from your system.
You may not copy this message in its entirety or in part, or disclose its 
contents to anyone.



--
Darko Todoric
Web Engineer, MDPI DOO
Veljka Dugosevica 54, 11060 Belgrade, Serbia
+381 65 43 90 620
www.mdpi.com

Disclaimer: The information and files contained in this message are confidential
and intended solely for the use of the individual or entity to whom they are 
addressed.
f you have received this message in error, please notify me and delete this 
message from your system.
You may not copy this message in its entirety or in part, or disclose its 
contents to anyone.