Re: Constant score and stopwords strange behaviour

2020-06-25 Thread Paras Lehana
Hi,

You can also change the multiplication factor in TF IDF snipped in the
source code to 1 also. I know there would be a better method to handle
stopwords now that you have used constant scoring but I wanted to mention
my method by what we got rid of TF.

On Thu, 25 Jun 2020 at 03:02, dbourassa  wrote:

> Hi,
>
> I'm working on a Solr core where we don't want to use TF-IDF (BM25).
> We rank documents with boost based on popularity, exact match, phrase
> match,
> etc.
>
> To bypass TF-IDF, we use constant score like this "q=harry^=0.5
> potter^=0.5"
> (score is always 1 before boost)
> We have just noticed a strange behaviour with this method.
> With "q=a cat", the stopword 'a' is automatically removed by the query
> analyzer.
> But with "q=a^0.5 cat^0.5", the stopword 'a' is not removed.
>
> We also tried something like "q=(a AND cat)^=1" but the problem still.
>
> Someone have an idea or a better solution to bypass TF-IDF ?
>
> relevant info in solrconfig :
> ...
> edismax
> 590%
> true
> ...
>
> relevant info in schema :
> 
> ...
>  words="stopwords_querytime_custom.txt"/>
> ...
>
>
> Thanks
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


[Q] Tie-break sorted result by boost

2020-06-25 Thread Paras Lehana
Hi Community,

This is what I want to achieve:

Suppose that we have 3 locations (name, type, population):

1) Rajasthan, State, -1
2) Udaipur, City, 451K
3) Udaipura, City, 13K

For a search query that will yield all three of these in the result, I want
the result to be ordered the same as written above. That is, I want the
type=State first and then City. Within type=City, I want results to be
boosted by population.

When we were on Solr 6.5, with query sort=type (take it as an enum field or
change to numeric) and boost=population, we were getting expected results.

We recently upgraded to 8.24 but noticed this result for the same query
today:

1) Rajasthan, State, -1
2) Udaipura, City, 13K
3) Udaipur, City, 451K

Looks like sorting is overriding the boost and tie-breaking by docid now. I
know that sorting was always supposed to override everything but I'm sure
about the change in result set.

For a workaround, we can do sort=type, population desc and remove boosts.
Boosts don't impact query time much but in this case, I guess, I will need
to reindex with docValues=true for the population. I also read about other
ways like function queries.

Any comments over this?

-- 
--
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-24 Thread Paras Lehana
Distributer/Fetcher?

On Wed, 24 Jun 2020 at 10:04, Noble Paul  wrote:

> Do we even call it the master/slave mode? I thought we had 2 modes
>
> * Standalone mode
> * SolrCloud mode
>
> On Wed, Jun 24, 2020 at 3:00 AM Tomás Fernández Löbbe
>  wrote:
> >
> > I agree in general with what Trey and Jan said and have suggested. I
> > personally like to use "leader/follower". It's true that somewhat
> collides
> > with SolrCloud terminology, but that's not a problem IMO, now that
> replica
> > types exist, the “role” of the replica (leader vs. non-leader/follower)
> > doesn’t specify the internals of how they behave, the replica type
> defines
> > that. So, in a non-SolrCloud world, they would still be leader/followers
> > regardless of how they perform that role.
> >
> > I also agree that the name of the role is not that important, more the
> > "mode" of the architecture needs to be renamed. We tend to refer to
> > "SolrCloud mode" and "Master/Slave mode", the main part in all this (IMO)
> > is to change that "mode" name. I kind of like Trey's suggestion of
> "Managed
> > Clustering" vs. "Manual Clustering" Mode (Or "managed" vs "manual"), but
> > still haven't made up my mind (especially the fact that "manual" usually
> > doesn't really mean "manual", is just "you build your tools”)…
> >
> > On Fri, Jun 19, 2020 at 1:38 PM Walter Underwood 
> > wrote:
> >
> > > > On Jun 19, 2020, at 7:48 AM, Phill Campbell
> > >  wrote:
> > > >
> > > > Delegator - Handler
> > > >
> > > > A common pattern we are all aware of. Pretty simple.
> > >
> > > The Solr master does not delegate and the slave does not handle.
> > > The master is a server that handles replication requests from the
> > > slave.
> > >
> > > Delegator/handler is a common pattern, but it is not the pattern
> > > that describes traditional Solr replication.
> > >
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > >
>
>
>
> --
> -
> Noble Paul
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: combined multiple bf into a single bf

2020-06-11 Thread Paras Lehana
Although you can use nested maps, for injecting variable values, I would
have used an intermediate script that makes the Solr URL.

On Wed, 10 Jun 2020 at 08:58, Derek Poh  wrote:

> I have the following boost requirement using bf
>
> response_rate is 3, boost by ^0.6
> response_rate is 2, boost by ^0.3
> response_time is 4, boost by ^0.6
> response_time is 3, boost by ^0.3
>
> I am using a bf for each of the boost requirement,
>
>
> bf=map(response_rate,3,3,0.6,0)=map(response_rate,2,2,0.3,0)=map(response_time,4,4,0.6,0)=map(response_time,3,3,0.3,0)
>
> I am trying to reduce on the number of parameters in the query.
>
> Is it possible to combined them into 1 or 2 bf?
>
> Running Solr 4.10.4.
>
> Derek
>
> --
> CONFIDENTIALITY NOTICE
>
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
>
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: What is the logical order of applying sorts in SOLR?

2020-05-15 Thread Paras Lehana
As a workaround, can you try field boosting?

On Tue, 12 May 2020 at 00:45, Stephen Lewis Bianamara <
stephen.bianam...@gmail.com> wrote:

> Hi SOLR Community,
>
> What is the order of operations which SOLR applies to sorting? I've
> observed many times and across SOLR versions that a restrictive filter with
> a sort takes an extremely long time to return, suggesting to me that the
> SORT is applied before the filter.
>
> An example situation is querying for fq:Foo=Bar vs querying for fq:Foo=Bar
> sort by Id desc. I've observed over many SOLR versions and collections that
> the former is orders of magnitude cheaper and quicker to respond, even when
> the result set is tiny (10-100).
>
> Does anyone in this forum know whether this is the default behavior and
> whether there is any way through the API or SOLR configuration to apply
> sorts after filters?
>
> Thanks,
> Stephen
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Why Solr query time is more in case the searched value frequency is more even if no sorting is applied, for the same number of rows?

2020-05-15 Thread Paras Lehana
Well, in a way, QTime can depend on the total number of terms existing in
the core.

It would have been better if you had posted sample query and analysis
chain.

On Mon, 11 May 2020 at 11:45, Anshuman Singh 
wrote:

> Suppose I have two phone numbers P1 and P2 and the number of records with
> P1 are X and with P2 are 2X (2 times X) respectively. If I query for R rows
> for P1 and P2, the QTime in case of P2 is more. I am not specifying any
> sort parameter and the number of rows I'm asking for is same in both the
> cases so why such difference?
>
> I understand that if I use sorting on some basis then it has to go through
> all the documents and then apply sorting on them before providing the
> requested rows. But without sorting can't it just read the first R
> documents from the index? In this case, I believe the QTime will not depend
> on the total number of documents with respect to the query but on the
> requested number of rows.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: How upgrade to Solr 8 impact performance

2020-04-22 Thread Paras Lehana
Hi Rajeswari,

I can only share my experience of moving from Solr 6 to Solr 8. I suggest
you to move and then reevaluate your performance metrics. To recall another
experience, we moved from Java 8 to 11 for Solr 8.

Please note experiences can differ! :)

On Wed, 22 Apr 2020 at 00:50, Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Any other experience from solr 7 to sol8 upgrade performance  .Please
> share.
>
> Thanks,
> Rajeswari
>
> On 4/15/20, 4:00 PM, "Paras Lehana"  wrote:
>
> In January, we upgraded Solr from version 6 to 8 skipping all versions
> in
> between.
>
> The hardware and Solr configurations were kept the same but we still
> faced
> degradation in response time by 30-50%. We had exceptional Query times
> around 25 ms with Solr 6 and now we are hovering around 36 ms.
>
> Since response times under 50 ms are very good even for Auto-Suggest,
> we
> have not tried any changes regarding this. Nevertheless, you can try
> using
> Caffeine Cache. Looking forward to read community inputs as well.
>
>
>
> On Thu, 16 Apr 2020 at 01:34, ChienHuaWang 
> wrote:
>
> > Do anyone have experience to upgrade the application with Solr 7.X
> to 8.X?
> > How's the query performance?
> > Found out a little slower response time from application with Solr8
> based
> > on
> > current measurement, still looking into more detail it.
> > But wondering is any one have similar experience? is that something
> we
> > should expect for Solr 8.X?
> >
> > Please kindly share, thanks.
> >
> > Regards,
> > ChienHua
> >
> >
> >
> > --
> > Sent from:
> https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, *Auto-Suggest*,
> IndiaMART InterMESH Ltd,
>
> 11th Floor, Tower 2, Assotech Business Cresterra,
> Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>
> Mob.: +91-9560911996
> Work: 0120-4056700 | Extn:
> *1196*
>
> --
> *
> *
>
>  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: How to implement spellcheck for custom suggest component?

2020-04-22 Thread Paras Lehana
Hi Buddy,

We have built Auto-Suggest over Solr with EdgeNGrams, Custom Spellcheck
Factory and Synonyms (for spelling mistakes). This solves for most cases.

If you have the dictionary for spelling mistakes, EdneNGrams after
Synonym factory will do the job.

On Thu, 16 Apr 2020 at 13:35, aTan  wrote:

> Hello.
> I'm new to Solr and would be thankful for advice for the following case:
> We have Suggest API running on production using Solr 6, which currently
> prevent changes in the response and query parameters. That's why SpellCheck
> component can't be used (parameter is custom, not 'q'or 'spellcheck.q').
> I've tried to search for the solution, but many threads ends without any
> clear answer.
>
> To my understanding there is 2 main ways.
> 1. Combine default filters, to emulate spellcheck behavior.
> Question: which combination might give good enough result?
> Advantage: will be very easy to integrate.
> Disadvantage: the quality and flexibility will be not very good
> 2. Implement custom filter, inside which implement advanced spellcheck
> functionality, using some open-source.
> Advantage: quality will be much higher
> Disadvantage: "invention of the bicycle" and even add custom filter to the
> production currently quite complicated.
> 3. Something else... open for suggestions :)
>
> The expected behavior:
> myrequestparam.q=iphon
> suggest: iphone, iphone 8...
>
> myrequestparam.q=iphonn
> suggest: iphone, iphone 8...
>
> If there is both cases possible and corrected suggestion is highly possible
> along with original one, maybe put it with lower weight in the list. But
> the
> response list should be the single entity (merged).
>
> Thanks.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: how to get rid of double quotes in solr

2020-04-15 Thread Paras Lehana
Hi,

Are you referring to the double quotes in the JSON result?

On Tue, 14 Apr 2020 at 08:29, sefty nindyastuti 
wrote:

> the data that I use is log from hadoop, my problem is hadoop log from
> cluster,
> the schema I use is filebeat --> logstash --> solr, I use logstash config
> to parse the hadoop log, the hadoop log is inputted to the logstash via
> filebeat then output from the logstash indexed to the solr
>
> Pada tanggal Sen, 13 Apr 2020 pukul 19.07 Erick Erickson <
> erickerick...@gmail.com> menulis:
>
> > I don’t quite know what you’re asking about. Is that input or intput to
> > Solr? Or is it output from logstash?
> >
> > What are you indexing? Because that doesn't look like data from a solr
> log.
> >
> > You might want to review: https://wiki.apache.org/solr/UsingMailingLists
> >
> > Best,
> > Erick
> >
> > > On Apr 13, 2020, at 12:24 AM, sefty nindyastuti 
> > wrote:
> > >
> > > I have a problem when indexing log data clusters in solr using logstash
> > and filebeat. there are double quotes in the solr index results,
> > > how to solve this problem, please help
> > >
> > > expect the results of the index that appears in solr as below:
> > >
> > >  {
> > > "input": "log"
> > > "hostname": "localhost"
> > > "id": "22eddbc9-e60f-29cd-a352-b40154ba1736",
> > > "type": "filebeat"
> > > "ephemeral_id": "1a31d6e0-8ed9-1307-215f-5dfd361364c9"
> > > "version": "7.6.1"
> > > "offset": "2061794 "
> > > "path": "
> /var/log/hadoop/hdfs/hadoop-hdfs-secondarynamenode-xx.log "
> > > "host": "localhostxxx",
> > > "message": "2020-04-11 19: 04: 28,575 INFO common.Util
> > (Util.java:receiveFile(314)) - Combined time for file downloads and fsync
> > to all disks stores 0.02s. The file download stores 0.02s at 58750.00 KB
> /
> > s Synchronous (fsync) write to disk of / hadoop / hdfs / namesecondary /
> > current / edits_tmp_ "
> > > {
> > >
> >
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: How upgrade to Solr 8 impact performance

2020-04-15 Thread Paras Lehana
In January, we upgraded Solr from version 6 to 8 skipping all versions in
between.

The hardware and Solr configurations were kept the same but we still faced
degradation in response time by 30-50%. We had exceptional Query times
around 25 ms with Solr 6 and now we are hovering around 36 ms.

Since response times under 50 ms are very good even for Auto-Suggest, we
have not tried any changes regarding this. Nevertheless, you can try using
Caffeine Cache. Looking forward to read community inputs as well.



On Thu, 16 Apr 2020 at 01:34, ChienHuaWang  wrote:

> Do anyone have experience to upgrade the application with Solr 7.X to 8.X?
> How's the query performance?
> Found out a little slower response time from application with Solr8 based
> on
> current measurement, still looking into more detail it.
> But wondering is any one have similar experience? is that something we
> should expect for Solr 8.X?
>
> Please kindly share, thanks.
>
> Regards,
> ChienHua
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: How to sum model grouped?

2020-03-17 Thread Paras Lehana
Hey,

Check this once:
https://lucene.apache.org/solr/guide/8_4/the-stats-component.html

On Mon, 16 Mar 2020 at 18:08, hakan  wrote:

> I use solr version 7.1. I have as grouped model in total 11M record, as
> below
> example.
> This question is, How do I sum fromfollowers field from this grouped model?
> {
>  groupValue: "1927245294",
>  doclist: {
> numFound: 1,
> start: 0,
> docs: [
> {
>fromuserid: "1927245294",
>fromfollowers: 185
>  }
>  ]
>   }
> },
> {
>  groupValue: "98405321",
>  doclist: {
> numFound: 1,
> start: 0,
> docs: [
> {
>fromuserid: "98405321",
>fromfollowers: 292
>  }
>  ]
>   }
> },
> {
>  groupValue: "182496421",
>  doclist: {
> numFound: 1,
> start: 0,
> docs: [
> {
>fromuserid: "182496421",
>fromfollowers: 111
>      }
>  ]
>   }
> }
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Fw: SolrException in Solr 6.1.0

2020-03-11 Thread Paras Lehana
eclipse.jetty.server.Server.handle(Server.java:518)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
> at org.eclipse.jetty.io
> .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> at org.eclipse.jetty.io
> .SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.lucene.store.AlreadyClosedException: this
> IndexWriter is closed
> at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:724)
> at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:738)
> at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1488)
> at
> org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:282)
> at
> org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214)
> at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169)
> ... 48 more
> Caused by: java.nio.file.FileSystemException:
> E:\SolrCloud\solr1\server\solr\workflows\data\index\_8suj.fdx: Insufficient
> system resources exist to complete the requested service.
>
> at
> sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
> at
> sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
> at
> sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
> at
> sun.nio.fs.WindowsFileSystemProvider.newByteChannel(WindowsFileSystemProvider.java:230)
> at
> java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434)
> at java.nio.file.Files.newOutputStream(Files.java:216)
> at
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:408)
> at
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:404)
> at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
> at
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
> at
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
> at
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:108)
> at
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:128)
> at
> org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsWriter(Lucene50StoredFieldsFormat.java:183)
> at
> org.apache.lucene.index.DefaultIndexingChain.initStoredFieldsWriter(DefaultIndexingChain.java:83)
> at
> org.apache.lucene.index.DefaultIndexingChain.startStoredFields(DefaultIndexingChain.java:331)
> at
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:368)
> at
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232)
> at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:449)
> at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1492)
> ... 51 more
>
> Can you suggest to me what is the cause? And how can we resolve it?
>
> Regards,
> Vishal Patel
>
> Sent from Outlook<http://aka.ms/weboutlook>
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr range search queries

2020-03-11 Thread Paras Lehana
Hi Niharika,

Range queries for string fields would work lexicographically and not
numeric I think. Read:
https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html#TheStandardQueryParser-RangeSearches
.

If this is the case, [2 TO 5] will include 200 and [2 TO 20] will not
include 19.

On Tue, 10 Mar 2020 at 15:09, Niharika  wrote:

> hello,
>
> I have  two field declared in schema xml as
>
> * required='false' multiValued='true' 
>  required='false' multiValued='true'>
> *
>
> I want to generate a query to find all the result in specific range for
> longitude and latitude
>
> My query looks like
>
> *latitude:[47.010225655683485 TO 52.40241887397332] AND
> longitude:[-2.021484375004 TO 14.63378906252]*
>
> the problem here is:- i am not getting all the result i except, can anyone
> suggest me what I can do here and why it is wrong?
>
> PS:- I have already tried rounding of the decimals, I cannot change the
> type
> from string in schema.xml.
>
> Thanks & Regards
> Niharika
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Using Synonym Graph Filter does not tokenize the query string if it has multi-word synonym

2020-03-01 Thread Paras Lehana
Hi Atin,

Please host your images on some other site as they won't reach the mailing
list as attachments. I had researched about Synonym support for a week
before enabling them in Auto-Suggest. Why do you want multi-term synonyms
to break? I guess only for matching documents and not tokenized synonyms.

I think you should try setting sow (split on whitespace) to true. Read more
here:
https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html#TheExtendedDisMaxQueryParser-ThesowParameter
.

A good article about this:
https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/

This is a very naive guess. I would need the screenshots and some more
details. :)

On Mon, 2 Mar 2020 at 01:17, atin janki  wrote:

> Hello everyone,
>
> I am using solr 8.3.
>
> After I included Synonym Graph Filter in my managed-schema file, I have
> noticed that if the query string contains a multi-word synonym, it
> considers that multi-word synonym as a single term and does not break it,
> further suppressing the default search behaviour.
>
> Here "soap powder" is the search query which is also a multi-word synonym
> in the synonym file as-
>
> s(104254535,1,'soap powder',n,1,1).
> s(104254535,2,'built-soap powder',n,1,0).
> s(104254535,3,'washing powder',n,1,0).
>
>
> I am sharing some screenshots for understanding the problem-
>
> *without* Synonym Graph Filter (2 docs returned) -
> [image: image.png]
>
>
> *with* Synonym Graph Filter (2 docs expected, only 1 returned)
>
> [image: image.png]
>
> Has anyone experienced this before? If yes, is there any workaround ?
> Or is it an expected behaviour?
>
> Regards,
> Atin Janki
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Re: Re: Re: Query Autocomplete Evaluation

2020-02-28 Thread Paras Lehana
Hey Audrey,

Users often skip results and go straight to vanilla search even though
> their query is displayed in the top of the suggestions list


Yes, we do track this in another metric. This behaviour is more
prevalent for shorter terms like "tea" and "bag". But, anyways, we measure
MRR for quantifying how high are we able to show suggestions to the users.
Since we include only the terms selection via Auto-Suggest in the universe
for calculation, the searches where user skip Auto-Suggest won't be
counted. I think we can safely exclude these if you're using MRR to measure
how well you order your result set. Still, if you want to include those,
you can always compare the search term with the last result set and include
them in MRR - you're actually right that users maybe skipping the lower
positions even if the intended suggestion is available. Our MRR stands at
68% and 75% of all of the suggestions are selected from position #1 or #2.


So acceptance rate = # of suggestions taken / total queries issued?


Yes. The total queries issues should ideally be those where Auto-Suggest
was selected or could have been selected i.e. we exclude voice searches. We
try to include as much as those searches which were made via typing in the
search bar. But that's how we have fine-tuned our tracking over months.
You're right about the general formula - searches via Auto-Suggest divided
by total Searches.


And Selection to Display = # of suggestions taken (this would only be 1, if
> the not-taken suggestions are given 0s) / total suggestions displayed? If
> the above is true, wouldn't Selection to Display be binary? I.e. it's
> either 1/# of suggestions displayed (assuming this is a constant) or 0?


Yup. Please note that this is calculated per session of Auto-Suggest. Let
the formula be S/D. We will take D (Display) as 1 and not 3 when a user
query for "bag" (b, ba, bag). If the S (Selection) was made in the last
display, it is 1 also. If a user selects "bag" after writing "ba", we don't
say that S=0, D=1 for "b" and S=1, D=1 for "ba". For this, we already track
APL (Average Prefix Length). S/D is calculated per search and thus, here
S=1, D=1 for search "bag". Thus, for a single search, S/D can be either 0
or 1 - you're right, it's binary!

Hope this helps. Loved your questions! :)

On Thu, 27 Feb 2020 at 22:21, Audrey Lorberfeld - audrey.lorberf...@ibm.com
 wrote:

> Paras,
>
> Thank you for this response! Yes, you are being clear __
>
> Regarding the assumptions you make for MRR, do you have any research
> papers to confirm that these user behaviors have been observed? I only ask
> because this paper http://yichang-cs.com/yahoo/sigir14_SearchAssist.pdf
> talks about how users often skip results and go straight to vanilla search
> even though their query is displayed in the top of the suggestions list
> (section 3.2 "QAC User Behavior Analysis"), among other behaviors that go
> against general IR intuition. This is only one paper, of course, but it
> seems that user research of QAC is hard to come by otherwise.
>
> So acceptance rate = # of suggestions taken / total queries issued ?
> And Selection to Display = # of suggestions taken (this would only be 1,
> if the not-taken suggestions are given 0s) / total suggestions displayed ?
>
> If the above is true, wouldn't Selection to Display be binary? I.e. it's
> either 1/# of suggestions displayed (assuming this is a constant) or 0?
>
> Best,
> Audrey
>
>
> 
> From: Paras Lehana 
> Sent: Thursday, February 27, 2020 2:58:25 AM
> To: solr-user@lucene.apache.org
> Subject: [EXTERNAL] Re: Re: Re: Query Autocomplete Evaluation
>
> Hi Audrey,
>
> For MRR, we assume that if a suggestion is selected, it's relevant. It's
> also assumed that the user will always click the highest relevant
> suggestion. Thus, we calculate position selection for each selection. If
> still, I'm not understanding your question correctly, feel free to contact
> me personally (hangouts?).
>
> And @Paras, the third and fourth evaluation metrics you listed in your
> > first reply seem the same to me. What is the difference between the two?
>
>
> I was expecting you to ask this - I should have explained a bit more.
> Acceptance Rate is the searches through Auto-Suggest for all Searches.
> Whereas, value for Selection to Display is 1 if the Selection is made given
> the suggestions were displayed otherwise 0. Here, the cases where results
> are displayed is the universal set. Acceptance Rate is counted 0 even for
> those searches where Selection was not made because there were no results
> while S/D will not count this - it only counts cases where the result was
> displayed.
>
> Hope I'm clear.

Re: Re: Re: Query Autocomplete Evaluation

2020-02-26 Thread Paras Lehana
Hi Audrey,

For MRR, we assume that if a suggestion is selected, it's relevant. It's
also assumed that the user will always click the highest relevant
suggestion. Thus, we calculate position selection for each selection. If
still, I'm not understanding your question correctly, feel free to contact
me personally (hangouts?).

And @Paras, the third and fourth evaluation metrics you listed in your
> first reply seem the same to me. What is the difference between the two?


I was expecting you to ask this - I should have explained a bit more.
Acceptance Rate is the searches through Auto-Suggest for all Searches.
Whereas, value for Selection to Display is 1 if the Selection is made given
the suggestions were displayed otherwise 0. Here, the cases where results
are displayed is the universal set. Acceptance Rate is counted 0 even for
those searches where Selection was not made because there were no results
while S/D will not count this - it only counts cases where the result was
displayed.

Hope I'm clear. :)

On Tue, 25 Feb 2020 at 21:10, Audrey Lorberfeld - audrey.lorberf...@ibm.com
 wrote:

> This article
> http://wwwconference.org/proceedings/www2011/proceedings/p107.pdf also
> indicates that MRR needs binary relevance labels, p. 114: "To this end, we
> selected a random sample of 198 (query, context) pairs from the set of
> 7,311 pairs, and manually tagged each of them as related (i.e., the query
> is related to the context; 60% of the pairs) and unrelated (40% of the
> pairs)."
>
> On 2/25/20, 10:25 AM, "Audrey Lorberfeld - audrey.lorberf...@ibm.com" <
> audrey.lorberf...@ibm.com> wrote:
>
> Thank you, Walter & Paras!
>
> So, from the MRR equation, I was under the impression the suggestions
> all needed a binary label (0,1) indicating relevance.* But it's great to
> know that you guys use proxies for relevance, such as clicks.
>
> *The reason I think MRR has to have binary relevance labels is this
> Wikipedia article:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Mean-5Freciprocal-5Frank=DwIGaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=1f2LPzuBvibQd8m-8_HuNVYFm0JvCGyPDul6r4ATsLk=Sn7KV-BcFDTrmc1PfRVeSpB9Ysh3UrVIQKcB3G5zstw=
> , where it states below the formula that rank_i = "refers to the rank
> position of the first relevant document for the i-th query." If the
> suggestions are not labeled as relevant (0) or not relevant (1), then how
> do you compute the rank of the first RELEVANT document?
>
> I'll check out these readings asap, thank you!
>
> And @Paras, the third and fourth evaluation metrics you listed in your
> first reply seem the same to me. What is the difference between the two?
>
> Best,
> Audrey
>
> On 2/25/20, 1:11 AM, "Walter Underwood"  wrote:
>
> Here is a blog article with a worked example for MRR based on
> customer clicks.
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2016_09_12_measuring-2Dsearch-2Drelevance-2Dwith-2Dmrr_=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE=GzNrf4l_FjMqOkSx2B4_sCIGoJv2QYPbPqWplHGE3PI=
>
> At my place of work, we compare the CTR and MRR of queries using
> suggestions to those that do not use suggestions. Solr autosuggest based on
> lexicon of book titles is highly effective for us.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE=L4yZqRG0pWGPpZ8U7S-feoiWSTrz_zBEq0FANYqncuE=
>  (my blog)
>
> > On Feb 24, 2020, at 9:52 PM, Paras Lehana <
> paras.leh...@indiamart.com> wrote:
> >
> > Hey Audrey,
> >
> > I assume MRR is about the ranking of the intended suggestion.
> For this, no
> > human judgement is required. We track position selection - the
> position
> > (1-10) of the selected suggestion. For example, this is our
> recent numbers:
> >
> > Position 1 Selected (B3) 107,699
> > Position 2 Selected (B4) 58,736
> > Position 3 Selected (B5) 23,507
> > Position 4 Selected (B6) 12,250
> > Position 5 Selected (B7) 7,980
> > Position 6 Selected (B8) 5,653
> > Position 7 Selected (B9) 4,193
> > Position 8 Selected (B10) 3,511
> > Position 9 Selected (B11) 2,997
> > Position 10 Selected (B12) 2,428
> > *Total Selections (B13)* *228,954*
>   

Re: Best Practises around relevance tuning per query

2020-02-26 Thread Paras Lehana
Hi Ashwin,

If I'm understanding your requirement correctly, I think you should read
about Payloads <https://lucidworks.com/post/solr-payloads/>.

On Thu, 27 Feb 2020 at 09:41, Ashwin Ramesh 
wrote:

> Hi everybody,
>
> Thank you for all the amazing feedback. I apologize for the formatting of
> my question.
>
> I guess if I was to generalize my question, 'What is the most common
> approaches to storing query level features in Solr documents?'
>
> For example, a normalized_click_score is a document level feature, but how
> would you scalably also do the same for specific queries? E.g. How do you
> define, *For the query 'ipod' this specific document is very relevant*.
>
> Thanks again!
>
> Regards,
>
> Ash
>
> On Wed, Feb 19, 2020 at 6:14 PM Jörn Franke  wrote:
>
> > You are too much focus on the solution. If you would describe the
> business
> > case in more detail without including the solution itself more people
> could
> > help.
> >
> > Eg it ie not clear why you have a scoring model and why this can address
> > business needs.
> >
> > > Am 18.02.2020 um 01:50 schrieb Ashwin Ramesh  >:
> > >
> > > Hi,
> > >
> > > We are in the process of applying a scoring model to our search
> results.
> > In
> > > particular, we would like to add scores for documents per query and
> user
> > > context.
> > >
> > > For example, we want to have a score from 500 to 1 for the top 500
> > > documents for the query “dog” for users who speak US English.
> > >
> > > We believe it becomes infeasible to store these scores in Solr because
> we
> > > want to update the scores regularly, and the number of scores increases
> > > rapidly with increased user attributes.
> > >
> > > One solution we explored was to store these scores in a secondary data
> > > store, and use this at Solr query time with a boost function such as:
> > >
> > > `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) …
> > > mul(termfreq(id,'ID-500'),1)`
> > >
> > > We have over a hundred thousand documents in one Solr collection, and
> > about
> > > fifty million in another Solr collection. We have some queries for
> which
> > > roughly 80% of the results match, although this is an edge case. We
> > wanted
> > > to know the worst case performance, so we tested with such a query. For
> > > both of these collections we found the a message similar to the
> following
> > > in the Solr cloud logs (tested on a laptop):
> > >
> > > Elapsed time: 5020. Exceeded allowed search time: 5000 ms.
> > >
> > > We then tried using the following boost, which seemed simpler:
> > >
> > > `boost=if(query($qq), 10, 1)=id:(ID-1 OR ID-2 OR … OR ID-500)`
> > >
> > > We then saw the following in the Solr cloud logs:
> > >
> > > `The request took too long to iterate over terms.`
> > >
> > > All responses above took over 5000 milliseconds to return.
> > >
> > > We are considering Solr’s re-ranker, but I don’t know how we would use
> > this
> > > without pushing all the query-context-document scores to Solr.
> > >
> > >
> > > The alternative solution that we are currently considering involves
> > > invoking multiple solr queries.
> > >
> > > This means we would make a request to solr to fetch the top N results
> > (id,
> > > score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar,
> > limit=N.
> > >
> > > Another request would be made using a filter query with a set of doc
> ids
> > > that we know are high value for the user’s query. E.g. q=*:*,
> > > fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N.
> > >
> > > We would then do a reranking phase in our service layer.
> > >
> > > Do you have any suggestions for known patterns of how we can store and
> > > retrieve scores per user context and query?
> > >
> > > Regards,
> > > Ash & Spirit.
> > >
> > > --
> > > **
> > > ** <https://www.canva.com/>Empowering the world to design
> > > Also, we're
> > > hiring. Apply here! <https://about.canva.com/careers/>
> > >
> > > <https://twitter.com/canva> <https://facebook.com/canva>
> > > <https://au.linkedin.com/company/canva> <https://twitter.com/canva>
> > > <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>
> > > <https://instagram.com/canva>
> >
>
> --
> **
> ** <https://www.canva.com/>Empowering the world to design
> Also, we're
> hiring. Apply here! <https://about.canva.com/careers/>
>
> <https://twitter.com/canva> <https://facebook.com/canva>
> <https://au.linkedin.com/company/canva> <https://twitter.com/canva>
> <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>
> <https://instagram.com/canva>
>
>
>
>
>
>
>
>
>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr 6.3 with Open JDK

2020-02-26 Thread Paras Lehana
Hi Vinodh,

You can safely use OpenJDK 1.8. Although, we have upgraded to Solr 8.4 and
Java 11 now, we were using Solr 6.5 and OpenJDK 1.8 for a long time.

Someone on the forums wrote that System Requirements page for Solr mentions
that 'You should
avoid Java 9 or later for Lucene/Solr 6.x or earlier'.

I also don't recommend Java 9 or 10 as they don't have LTS support. Stay
with OpenJDK 1.8 or upgrade Solr.

On Wed, 26 Feb 2020 at 16:42, Kommu, Vinodh K.  wrote:

> Hi Team,
>
> Anyone using Solr 6.3 version with Open JDK? If so, what version of open
> JDK you are using? And can we use open JDK 1.8.0_242 or later version with
> Solr 6.3 version?
>
>
> Regards,
> Vinodh
>
> DTCC DISCLAIMER: This email and any files transmitted with it are
> confidential and intended solely for the use of the individual or entity to
> whom they are addressed. If you have received this email in error, please
> notify us immediately and delete the email and any attachments from your
> system. The recipient should check this email and any attachments for the
> presence of viruses. The company accepts no liability for any damage caused
> by any virus transmitted by this email.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Optimize solr 8.4.1

2020-02-26 Thread Paras Lehana
Hi Massimiliano,

Is it still necessary to run the Optimize command from my application when
> I have finished indexing?


I guess you can stop worrying about optimizations and let Solr handle that
implicitly. There's nothing so bad about having more segments.

On Wed, 26 Feb 2020 at 16:02, Massimiliano Randazzo <
massimiliano.randa...@gmail.com> wrote:

> > Good morning,
> >
> > recently I went from version 6.4 to version 8.4.1, I access solerre
> > through java applications written by me to which I have updated the
> > solr-solrj-8.4.1.jar libraries.
> >
> > I am performing the OCR indexing of a newspaper of about 550,000 pages in
> > production for which I have calculated at least 1,000,000,000 words and I
> > am experiencing slowness I wanted to know if you could advise me on
> changes
> > to the configuration.
> >
> > The server I'm using is a server with 12 cores and 64GB of Ram, the only
> > changes I made in the configuration are:
> > Solr.in.sh <http://solr.in.sh/> file
> > SOLR_HEAP="20480m"
> > SOLR_JAVA_MEM="-Xms20480m -Xmx20480m"
> > GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
> >   -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"
> > The Java version I use is
> > java version "1.8.0_51"
> > Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
> > Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)
> >
> > Also comparing the solr web interface I noticed a difference in the
> > "Overview" page in solr 6.4 it was affected Optimized and Current and
> > allowed me to launch Optimized if necessary, in version 8.41 Optimized is
> > no longer present I hypothesized that this activity is done with the
> commit
> > or through some operation in the backgroup, if this were so, is it still
> > necessary to run the Optimize command from my application when I have
> > finished indexing? I noticed that the Optimized function requires
> > considerable time and resources especially in large databases
> >
> > Thank you for your attention
>
> Massimiliano Randazzo
>
> >
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: How to check for uncommitted changes

2020-02-26 Thread Paras Lehana
Hey Connor,

You can use metrics API which has an attribute - docsPending.

API: 
host:port/solr/admin/metrics?group=core=UPDATE.updateHandler.docsPending

Read more here:
https://lucene.apache.org/solr/guide/8_4/performance-statistics-reference.html#update-handler

On Wed, 26 Feb 2020 at 02:50, Connor Howington  wrote:

> Is there a request I can make to Solr from a client to tell me whether a
> core has any uncommitted changes?
>
> Thanks,
> Connor
>
> *--*
>
> *Connor Howington*
> *Associate Research Programmer*
> Center for Research Computing (CRC)
> University of Notre Dame
> crc.nd.edu
>
> 832M Flanner Hall
> Notre Dame, IN 46556
>
> [image: Academicmark]
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr datePointField facet

2020-02-26 Thread Paras Lehana
> />
>   dest="text"
>/>
>  dest="text"
> />
>  dest="text"
> />
>   dest="text"
> />
>   dest="text"
>/>
> dest="text"
>   />
>  source="PARTY.COMPETENCY_D.MATERIAL_TREATMENT_D"
> dest="text"
>/>
>
> dest="text"
>   />
> dest="text"
>   />
>  dest="text"
> />
>  dest="text"
>/>
>   dest="text"
> />
>dest="text"
>  />
>   dest="text"
>/>
>
>   dest="text"
> />
>dest="text"
>  />
>
>   dest="text"
>
> />
>
>   dest="text"
>
> />
>   dest="text"
> />
>
> 
>
> 
> dest="PARTY"
>  />
> 
>  dest="PARTY"
>  />
> 
> dest="PARTY"
>/>
> 
> dest="PARTY"
>/>
> 
>  dest="PARTY"
>  />
> 
>   dest="PARTY"
>  />
> 
>   dest="PARTY"
>  />
> 
>  dest="PARTY"
>  />
> 
> 
> 
> 
> 
>
> 
>dest="FUNC_QLFR"
> />
> 
> dest="FUNC_QLFR"
>/>
>
> 
> dest="Party_Relation_Tiered"/>
> 
> dest="Party_Relation_Tiered"/>
> 
> dest="Party_Relation_Tiered"/>
>     
> dest="Party_Relation_Tiered"/>
> 
> dest="Party_Relation_Tiered"/>
> 
> dest="Party_Relation_Tiered"/>
> 
> dest="Party_Relation_Tiered"/>
> 
> dest="Party_Relation_Tiered"/>
> 
> dest="Party_Relation_Tiered"/>
> 
> dest="Party_Relation_Tiered"/>
>
>dest="Competency"
> />
> dest="Competency"
&

Re: Solr datePointField facet

2020-02-25 Thread Paras Lehana
Hi Srinivas,

But still facing the same error.


The same error? Can you please post the facet query? Please post (part of)
your schema too.

On Tue, 25 Feb 2020 at 16:00, Srinivas Kashyap
 wrote:

> Hi all,
>
> I have a date field in my schema and I'm trying to facet on that field and
> getting below error:
>
>  omitTermFreqAndPositions="true"  multiValued="true" />
>
> This field I'm copying to text field(copyfield) as well.
>
> 
>
> Error:
> Can't facet on a PointField without docValues
>
> I tried adding like below:
>
> 
>
>  omitTermFreqAndPositions="true"  multiValued="true" />
>
> And after the changes, I did full reindex of the core and restarted as
> well.
>
> But still facing the same error. Can somebody please help.
>
> Thanks,
> Srinivas
>
>
>
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender
> immediately by replying to the e-mail, and then delete it without making
> copies or using it in any way.
> No representation is made that this email or any attachments are free of
> viruses. Virus scanning is recommended and is the responsibility of the
> recipient.
>
> Disclaimer
>
> The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and others
> authorized to receive it. If you are not the recipient, you are hereby
> notified that any disclosure, copying, distribution or taking action in
> relation of the contents of this information is strictly prohibited and may
> be unlawful.
>
> This email has been scanned for viruses and malware, and may have been
> automatically archived by Mimecast Ltd, an innovator in Software as a
> Service (SaaS) for business. Providing a safer and more useful place for
> your human generated data. Specializing in; Security, archiving and
> compliance. To find out more visit the Mimecast website.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr console showing error in 7 .7

2020-02-25 Thread Paras Lehana
Please post full error possibly with trace (see logs).

On Mon, 20 Jan 2020 at 22:29, Rajdeep Sahoo 
wrote:

> When reloading the solr console,it is showing some error in the console
> itself for some small amount of time.
> The error is error reloading/initialising the core.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Is it possible to add stemming in a text_exact field

2020-02-25 Thread Paras Lehana
Hi Dhanesh,

Use KeywordRepeatFilterFactory
<https://lucene.apache.org/solr/guide/8_4/language-analysis.html#keywordrepeatfilterfactory>.
It will emit each token twice and marking one of them as KEYWORD so
stemming won't work on that token. Use RemoveDuplicates to remove the
duplicates after this.

On Fri, 24 Jan 2020 at 17:13, Lucky Sharma  wrote:

> Hi Dhanesh,
> I have also encountered the problem long back when we have 'skimmed milk'
> and need to search for 'skim milk', for that we have written one filter,
> such that we can customize it, and then use KStemmer, then apply the custom
> ConcatPhraseFilterFactory.
>
> You can use the link mentioned below to review:
> https://github.com/MighTguY/solr-extensions
>
> Regards,
> Lucky Sharma
>
> On Thu, 23 Jan, 2020, 8:58 pm Alessandro Benedetti, 
> wrote:
>
> > Edward is correct, furthermore using a stemmer in an analysis chain that
> > don't tokenise is going to work just for single term queries and single
> > term field values...
> > Not sure it was intended ...
> >
> > Cheers
> >
> >
> > --
> > Alessandro Benedetti
> > Search Consultant, R Software Engineer, Director
> > www.sease.io
> >
> >
> > On Wed, 22 Jan 2020 at 16:26, Edward Ribeiro 
> > wrote:
> >
> > > Hi,
> > >
> > > One possible solution would be to create a second field (e.g.,
> > > text_general) that uses DefaultTokenizer, or other tokenizer that
> breaks
> > > the string into tokens, and use a copyField to copy the content from
> > > text_exact to text_general. Then, you can use edismax parser to search
> > both
> > > fields, but giving text_exact a higher boost (qf=text_exact^5
> > > text_general). In this case, both fields should be indexed, but only
> one
> > > needs to be stored.
> > >
> > > Edward
> > >
> > > On Wed, Jan 22, 2020 at 10:34 AM Dhanesh Radhakrishnan <
> > dhan...@hifx.co.in
> > > >
> > > wrote:
> > >
> > > > Hello,
> > > > I'm facing an issue with stemming.
> > > > My search query is "restaurant dubai" and returns  results.
> > > > If I search "restaurants dubai" it returns no data.
> > > >
> > > > How to stem this keyword "restaurant dubai" with "restaurants dubai"
> ?
> > > >
> > > > I'm using a text exact field for search.
> > > >
> > > >  > > > multiValued="true" omitNorms="false"
> omitTermFreqAndPositions="false"/>
> > > >
> > > > Here is the field definition
> > > >
> > > >  > > > positionIncrementGap="100">
> > > > 
> > > >
> > > >
> > > >
> > > >
> > > > 
> > > > 
> > > >   
> > > >   
> > > >   
> > > >   
> > > >
> > > > 
> > > >
> > > > Is there any solutions without changing the tokenizer class.
> > > >
> > > >
> > > >
> > > >
> > > > Dhanesh S.R
> > > >
> > > > --
> > > > IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd.
> Its
> > > > content are confidential to the intended recipient. If you are not
> the
> > > > intended recipient, be advised that you have received this e-mail in
> > > error
> > > > and that any use, dissemination, forwarding, printing or copying of
> > this
> > > > e-mail is strictly prohibited. It may not be disclosed to or used by
> > > > anyone
> > > > other than its intended recipient, nor may it be copied in any way.
> If
> > > > received in error, please email a reply to the sender, then delete it
> > > from
> > > > your system.
> > > >
> > > > Although this e-mail has been scanned for viruses, HiFX
> > > > cannot ultimately accept any responsibility for viruses and it is
> your
> > > > responsibility to scan attachments (if any).
> > > >
> > > > ​Before you print this email
> > > > or attachments, please consider the negative environmental impacts
> > > > associated with printing.
> > > >
> > >
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr cloud production set up

2020-02-24 Thread Paras Lehana
Hi Rajdeep,


   1. I assume you had enabled docValues for the facet fields, right?
   2. What does your GC logs tell? Do you get freezes and CPU spikes during
   intervals?
   3. Caching will help in querying. I'll need to see a sample query of
   yours to recommend what you can tweak.


On Tue, 28 Jan 2020 at 19:09, Jason Gerlowski  wrote:

> Hi Rajdeep,
>
> Unfortunately it's near impossible for anyone here to tell you what
> parameters to tweak.  People might take guesses based on their
> individual past experience, but ultimately those are just guesses.
>
> There are just too many variables affecting Solr performance for
> anyone to have a good guess without access to the cluster itself and
> the time and will to dig into it.
>
> Are there GC params that need tweaking?  Very possible, but you'll
> have to look into your gc logs to see how much time is being spent in
> gc.  Are there query params you could be changing?  Very possible, but
> you'll have to identify the types of queries you're submitting and see
> whether the ref-guide offers any information on how to tweak
> performance for those particular qparsers, facets, etc.  Is the number
> of facets the reason for slow queries?  Very possible, but you'll have
> to turn faceting off or run debug=timing and see how what that tells
> you about the QTime's.
>
> Tuning Solr performance is a tough, time consuming process.  I wish
> there was an easier answer for you, but there's not.
>
> Best,
>
> Jason
>
> On Mon, Jan 20, 2020 at 12:06 PM Rajdeep Sahoo
>  wrote:
> >
> > Please suggest anyone
> >
> > On Sun, 19 Jan, 2020, 9:43 AM Rajdeep Sahoo,  >
> > wrote:
> >
> > > Apart from reducing no of facets in the query, is there any other query
> > > params or gc params or heap space or anything else that we need to
> tweak
> > > for improving search response time.
> > >
> > > On Sun, 19 Jan, 2020, 3:15 AM Erick Erickson,  >
> > > wrote:
> > >
> > >> Add =timing to the query and it’ll show you the time each
> component
> > >> takes.
> > >>
> > >> > On Jan 18, 2020, at 1:50 PM, Rajdeep Sahoo <
> rajdeepsahoo2...@gmail.com>
> > >> wrote:
> > >> >
> > >> > Thanks for the suggestion,
> > >> >
> > >> > Is there any way to get the info which operation or which query
> params
> > >> are
> > >> > increasing the response time.
> > >> >
> > >> >
> > >> > On Sat, 18 Jan, 2020, 11:59 PM Dave, 
> > >> wrote:
> > >> >
> > >> >> If you’re not getting values, don’t ask for the facet. Facets are
> > >> >> expensive as hell, maybe you should think more about your query’s
> than
> > >> your
> > >> >> infrastructure, solr cloud won’t help you at all especially if your
> > >> asking
> > >> >> for things you don’t need
> > >> >>
> > >> >>> On Jan 18, 2020, at 1:25 PM, Rajdeep Sahoo <
> > >> rajdeepsahoo2...@gmail.com>
> > >> >> wrote:
> > >> >>>
> > >> >>> We have assigned 16 gb out of 24gb for heap .
> > >> >>> No other process is running on that node.
> > >> >>>
> > >> >>> 200 facets fields are there in the query but we will not be
> getting
> > >> the
> > >> >>> values for each facets for every search.
> > >> >>> There can be max of 50-60 facets for which we will be getting
> values.
> > >> >>>
> > >> >>> We are using caching,is it not going to help.
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>> On Sat, 18 Jan, 2020, 11:36 PM Shawn Heisey, <
> apa...@elyograg.org>
> > >> >> wrote:
> > >> >>>>
> > >> >>>>> On 1/18/2020 10:09 AM, Rajdeep Sahoo wrote:
> > >> >>>>> We are having 2.3 million documents and size is 2.5 gb.
> > >> >>>>>  10 core cpu and 24 gb ram . 16 slave nodes.
> > >> >>>>>
> > >> >>>>>  Still some of the queries are taking 50 sec at solr end.
> > >> >>>>> As we are using solr 4.6 .
> > >> >>>>>  Other thing is we are having 200 (avg) facet fields  in a
> query.
> > >> >>>>> And 30 searchable fields.
> > >

Re: Solr 6.3 and OpenJDK 11

2020-02-24 Thread Paras Lehana
Hi Arnold,

Why not simply use the latest Solr 8 with Java 11? Upgrade is worth it. :)

Actually, no one would be able to answer your question without having the
experience with the same. Or you can experiment and report the experience
here. :P

On Wed, 29 Jan 2020 at 04:43, Arnold Bronley 
wrote:

> Hi,
>
> How much of a problem would it be if I use OpenJDK 11 with Solr 6.3. I am
> aware that the system requirements page for Solr mentions that 'You should
> avoid Java 9 or later for Lucene/Solr 6.x or earlier.' I am interested in
> knowing what sort functionalities would break in Solr if I try to use
> OpenJDK 11 with Solr 6.3.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Spell check with data from database and not from english dictionary

2020-02-24 Thread Paras Lehana
Just asking here.

What are community's experiences regarding using spellcheck with external
file against Synonym filter for exact matches?

On Wed, 29 Jan 2020 at 17:28, seeteshh  wrote:

> Hello Jan
>
> Let me work on your suggestions too.
>
> Also I had one query
>
> While working on the spell check component, I dont any suggestion for the
> incorrect word typed
>
> example : In spellcheck.q,   I type "Teh" instead of "The" or "saa" instead
> of "sea"
>
>   "responseHeader":{
> "status":0,
> "QTime":0,
> "params":{
>   "spellcheck.q":"Teh",
>   "spellcheck":"on",
>   "spellcheck.reload":"true",
>   "spellcheck.build":"true",
>   "_":"1580287370193",
>   "spellcheck.collate":"true"}},
>   "command":"build",
>   "response":{"numFound":0,"start":0,"docs":[]
>   },
>   "spellcheck":{
> "suggestions":[],
> "collations":[]}}
>
> I have to create an entry in the synonyms.txt file for teh => The to make
> up
> for this issue.
>
> Does Solr require a 4 digit character in spellcheck.q to provide the proper
> suggestion for the mis-spelt word? Is there any section in the Reference
> guide  where it is documented? These are my findings/observations but need
> to know the rationale behind this.
>
> Regards,
>
> Seetesh Hindlekar
>
>
>
>
>
> -
> Seetesh Hindlekar
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Re: Query Autocomplete Evaluation

2020-02-24 Thread Paras Lehana
Hey Audrey,

I assume MRR is about the ranking of the intended suggestion. For this, no
human judgement is required. We track position selection - the position
(1-10) of the selected suggestion. For example, this is our recent numbers:

Position 1 Selected (B3) 107,699
Position 2 Selected (B4) 58,736
Position 3 Selected (B5) 23,507
Position 4 Selected (B6) 12,250
Position 5 Selected (B7) 7,980
Position 6 Selected (B8) 5,653
Position 7 Selected (B9) 4,193
Position 8 Selected (B10) 3,511
Position 9 Selected (B11) 2,997
Position 10 Selected (B12) 2,428
*Total Selections (B13)* *228,954*
MRR = (B3+B4/2+B5/3+B6/4+B7/5+B8/6+B9/7+B10/8+B11/9+B12/10)/B13 = 66.45%

Refer here for MRR calculation keeping Auto-Suggest in perspective:
https://medium.com/@dtunkelang/evaluating-search-measuring-searcher-behavior-5f8347619eb0

"In practice, this is inverted to obtain the reciprocal rank, e.g., if the
searcher clicks on the 4th result, the reciprocal rank is 0.25. The average
of these reciprocal ranks is called the mean reciprocal rank (MRR)."

nDCG may require human intervention. Please let me know in case I have not
understood your question properly. :)



On Mon, 24 Feb 2020 at 20:49, Audrey Lorberfeld - audrey.lorberf...@ibm.com
 wrote:

> Hi Paras,
>
> This is SO helpful, thank you. Quick question about your MRR metric -- do
> you have binary human judgements for your suggestions? If no, how do you
> label suggestions successful or not?
>
> Best,
> Audrey
>
> On 2/24/20, 2:27 AM, "Paras Lehana"  wrote:
>
> Hi Audrey,
>
> I work for Auto-Suggest at IndiaMART. Although we don't use the
> Suggester
> component, I think you need evaluation metrics for Auto-Suggest as a
> business product and not specifically for Solr Suggester which is the
> backend. We use edismax parser with EdgeNGrams Tokenization.
>
> Every week, as the property owner, I report around 500 metrics. I would
> like to mention a few of those:
>
>1. MRR (Mean Reciprocal Rate): How high the user selection was
> among the
>returned result. Ranges from 0 to 1, the higher the better.
>2. APL (Average Prefix Length): Prefix is the query by user. Lesser
> the
>better. This reports how less an average user has to type for
> getting the
>intended suggestion.
>3. Acceptance Rate or Selection: How many of the total searches are
>being served from Auto-Suggest. We are around 50%.
>4. Selection to Display Ratio: Did you make the user to click any
> of the
>suggestions if they are displayed?
>5. Response Time: How fast are you serving your average query.
>
>
> The Selection and Response Time are our main KPIs. We track a lot about
> Auto-Suggest usage on our platform which becomes apparent if you
> observe
> the URL after clicking a suggestion on dir.indiamart.com. However, not
> everything would benefit you. Do let me know for any related query or
> explanation. Hope this helps. :)
>
> On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld -
> audrey.lorberf...@ibm.com
>  wrote:
>
> > Hi all,
> >
> > How do you all evaluate the success of your query autocomplete (i.e.
> > suggester) component if you use it?
> >
> > We cannot use MRR for various reasons (I can go into them if you're
> > interested), so we're thinking of using nDCG since we already use
> that for
> > relevance eval of our system as a whole. I am also interested in the
> metric
> > "success at top-k," but I can't find any research papers that
> explicitly
> > define "success" -- I am assuming it's a suggestion (or suggestions)
> > labeled "relevant," but maybe it could also simply be the suggestion
> that
> > receives a click from the user?
> >
> > Would love to hear from the hive mind!
> >
> > Best,
> > Audrey
> >
> > --
> >
> >
> >
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, *Auto-Suggest*,
> IndiaMART InterMESH Ltd,
>
> 11th Floor, Tower 2, Assotech Business Cresterra,
> Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>
> Mob.: +91-9560911996
> Work: 0120-4056700 | Extn:
> *11096*
>
> --
> *
> *
>
>  <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=CTfu2EkiAFh-Ra4cn3EL2GdkKLBhD754dBAoRYpr2uc=kwWlK4TbSM6iPH6DBIrwg3QCeHrY-83N5hm2HtQQsjc=
> >
>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Auto-Suggest within Tier Architecture

2020-02-24 Thread Paras Lehana
Hi Brett,

We, at IndiaMART, have Solr installed behind PHP servers which are
behind Varnish servers.

Yes, you are right exposing Solr URL is not a good idea. A single service
in between would do the trick.

You can try our service at dir.indiamart.com. We have a client-side JS that
handles AJAX requests per keystroke. Do let me know for any other queries.
:)

On Mon, 3 Feb 2020 at 22:10, Moyer, Brett  wrote:

> Hello,
>
> Looking to see how others accomplished this goal. We have a 3 Tier
> architecture, Solr is down deep in T3 far from the end user. How do you
> make Auto-Suggest calls from the Internet Browser through the Tiers down to
> Solr in T3? We essentially created steps down each tier, but I'm looking to
> know what other approaches people have created. Did you put your solr in
> T1, I assume not, that would put it at risk. Thanks!
>
> Brett Moyer
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender
> immediately and then delete it.
>
> TIAA
> *********
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Exact search in Solr

2020-02-24 Thread Paras Lehana
Use fieldType string.

If you're using custom fieldType, "secret" would not match with "secrets"
unless you use appropriate analyzer (Stemmer, EdgeNGrams) but it may still
match with "secret something" if you're using StandardTokenizer or
something similar (use KeywordTokenizer).

On Tue, 4 Feb 2020 at 20:28, yeikel valdes  wrote:

> You can store a non alayzed version and copy it to an analyzed field.
>
>
> If you need full text search, you se the analyzed version. Otherwise use
> the non analyzed version.
>
>
> If you want to search both you could still do that and boost the non
> alayzed version if needed
>
>
>
>
>  On Tue, 04 Feb 2020 04:50:22 -0500 m...@apache.org wrote 
>
>
> Hello, Łukasz
> The later for sure.
>
> On Tue, Feb 4, 2020 at 12:44 PM Antczak, Lukasz
>  wrote:
>
> > Hi, Solr experts!
> >
> > I would like to learn from you if there is a better solution for doing
> > 'exact search' in Solr.
> > Exact search means no analysis for the text other then tokenization.
> Query
> > "secret" gives back only documents containing exactly "secret" not
> > "secrets", "secrection", etc. Text that needs to be searched is content
> of
> > some articles.
> >
> > Solution 1. - index whole text as string, use regex for searching.
> > Solution 2. - index text with just tokenization, no lowercase, stemming,
> > etc.
> >
> > Which solution will be faster? Any other clever ideas to be evaluated?
> >
> > Regards
> > Łukasz Antczak
> > --
> > *Łukasz Antczak*
> > Senior IT Professional
> > GS Data Frontiers Team <http://go.roche.com/bigs>
> >
> > *Planned absences:*
> > *Roche Polska Sp. z o.o.*
> > ADMD Group Services - Business Intelligence Team
> > HQ: ul. Domaniewska 39B, 02-672 Warszawa
> > Office: ul. Abpa Baraniaka 88D, 61-131 Poznań
> >
> > Mobile: +48 519 515 010
> > mailto: lukasz.antc...@roche.com
> >
> > *Informacja o poufności: *Treść tej wiadomości zawiera informacje
> > przeznaczone tylko dla adresata. Jeżeli nie jesteście Państwo jej
> > adresatem, bądź otrzymaliście ją przez pomyłkę, prosimy o powiadomienie o
> > tym nadawcy oraz trwałe jej usunięcie. Wszelkie nieuprawnione
> > wykorzystanie informacji zawartych w tej wiadomości jest zabronione.
> >
> > *Confidentiality Note:* This message is intended only for the use of the
> > named recipient(s) and may contain confidential and/or proprietary
> > information. If you are not the intended recipient, please contact the
> > sender and delete this message. Any unauthorized use of the information
> > contained in this message is prohibited.
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: ID is a required field in SolrSchema . But not found in DataConfig

2020-02-24 Thread Paras Lehana
Your schema describes id as a required field. You need to tell Solr which
field from the source (most probably, the primary key) will be the id field
in Solr index.

On Tue, 4 Feb 2020 at 23:15, Karl Stoney
 wrote:

> Hey all,
> I'm trying to use the DIH to copy from one collection to another, it
> appears to work (data gets copied) however I've noticed this in the logs:
>
> 17:39:58.167 [qtp1472216456-87] INFO
> org.apache.solr.handler.dataimport.config.DIHConfiguration - ID is a
> required field in SolrSchema . But not found in DataConfig
>
> I can't find the appropriate configuration to get rid of it.  Do I need to
> care?
>
> My config looks like this:
>
> 
> 
>  query="*:*"
>  batchSize="1000"
>  fl="*,old_version:_version_,old_lmake:L_MAKE,old_lmodel:L_MODEL"
>  wt="javabin"
>  url="http://127.0.0.1/solr/at-uk;>
>
> 
> 
>
> Cheers
> Karl
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office:
> 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England
> No. 9439967). This email and any files transmitted with it are confidential
> and may be legally privileged, and intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> email in error please notify the sender. This email message has been swept
> for the presence of computer viruses.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr Commit Not Working

2020-02-23 Thread Paras Lehana
Hi,

This is above result is what I want to be able to commit but when I run the
> same command with commit=true it will not work like below.
> curl
> 'http://54.146.2.60:8983/solr/eatzcollection/update/json?commit=true' -d
> '[{"id":"location_23_deal_51","votes":"23"}]' -H
> 'Content-type:application/json'
>



I suppose you had missed writing set for votes? I think in that case,
there's probably an indexing error (since you didn't mention all the
fields) that caused nothing to get indexed (notice the version is the
same).

On Wed, 12 Feb 2020 at 21:40, logancraft 
wrote:

> So I am trying to do a partial update to a document in Solr, but it will
> not
> commit!
>
> So this is the original doc I am trying to update with 11 votes.
>
> {
>   "doc":
>   {
> "id":"location_23_deal_51",
> "deal_id":"deal_51",
> "deal":"$1.99 Kid's Meal with Purchase of Adult Meal",
> "rating":"3",
> "votes":"11",
> "deal_restaurant":"Moe's BBQ",
> "deal_type":"Kids",
> "content_type":"deal",
> "day":"Sunday",
> "_version_":1658287992903565312
>   }
> }
>
> So I can run the command without the commit and it works like below:
>
> curl 'http://54.146.2.60:8983/solr/eatzcollection/update/json' -d
> '[{"id":"location_23_deal_51","votes":{"set":23}}]' -H
> 'Content-type:application/json'
>
>
> And then I run a get command it returns the right results.
>
> curl
> http://54.146.2.60:8983/solr/eatzcollection/get\?id\=location_23_deal_51
>
> {
>   "doc":
>   {
> "id":"location_23_deal_51",
> "deal_id":"deal_51",
> "deal":"$1.99 Kid's Meal with Purchase of Adult Meal",
> "rating":"3",
> "votes":"23",
> "deal_restaurant":"Moe's BBQ",
> "deal_type":"Kids",
> "content_type":"deal",
> "day":"Sunday",
> "_version_":1658297071939092480
>   }
> }
>
>
> This is above result is what I want to be able to commit but when I run the
> same command with commit=true it will not work like below.
>
> curl 'http://54.146.2.60:8983/solr/eatzcollection/update/json?commit=true'
> -d '[{"id":"location_23_deal_51","votes":"23"}]' -H
> 'Content-type:application/json'
>
> And I run the get command I get the wrong result.
>
> curl
> http://54.146.2.60:8983/solr/eatzcollection/get\?id\=location_23_deal_51
>
> {
>   "doc":
>   {
> "id":"location_23_deal_51",
> "deal_id":"deal_51",
> "deal":"$1.99 Kid's Meal with Purchase of Adult Meal",
> "rating":"3",
> "votes":"11",
> "deal_restaurant":"Moe's BBQ",
> "deal_type":"Kids",
> "content_type":"deal",
> "day":"Sunday",
> "_version_":1658287992903565312
>   }
> }
>
>
> I have tried a lot different query string like update/json, /update, but
> they all stop working when I add the commit=true parameter to the query
> string.
>
> Any ideas will be much appreciated!
>
>
>
>
>
>
>
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr Commit Not Working on Update

2020-02-23 Thread Paras Lehana
Hi,

This is above result is what I want to be able to commit but when I run the
> same command with commit=true it will not work like below.
> curl
> 'http://54.146.2.60:8983/solr/eatzcollection/update/json?commit=true' -d
> '[{"id":"location_23_deal_51","votes":"23"}]' -H
> 'Content-type:application/json'
>



I suppose you had missed writing set for votes? I think in that case,
there's probably an indexing error (since you didn't mention all the
fields) that caused nothing to get indexed (notice the version is the
same).

On Wed, 12 Feb 2020 at 21:43, logancraft 
wrote:

> So I am trying to do a partial update to a document in Solr, but it will
> not
> commit!
>
> So this is the original doc I am trying to update with 11 votes.
>
> {
>   "doc":
>   {
> "id":"location_23_deal_51",
> "deal_id":"deal_51",
> "deal":"$1.99 Kid's Meal with Purchase of Adult Meal",
> "rating":"3",
> "votes":"11",
> "deal_restaurant":"Moe's BBQ",
> "deal_type":"Kids",
> "content_type":"deal",
> "day":"Sunday",
> "_version_":1658287992903565312
>   }
> }
>
> So I can run the command without the commit and it works like below:
>
> curl 'http://54.146.2.60:8983/solr/eatzcollection/update/json' -d
> '[{"id":"location_23_deal_51","votes":{"set":23}}]' -H
> 'Content-type:application/json'
>
>
> And then I run a get command it returns the right results.
>
> curl
> http://54.146.2.60:8983/solr/eatzcollection/get\?id\=location_23_deal_51
>
> {
>   "doc":
>   {
> "id":"location_23_deal_51",
> "deal_id":"deal_51",
> "deal":"$1.99 Kid's Meal with Purchase of Adult Meal",
> "rating":"3",
> "votes":"23",
> "deal_restaurant":"Moe's BBQ",
> "deal_type":"Kids",
> "content_type":"deal",
> "day":"Sunday",
> "_version_":1658297071939092480
>   }
> }
>
>
> This is above result is what I want to be able to commit but when I run the
> same command with commit=true it will not work like below.
>
> curl
> 'http://54.146.2.60:8983/solr/eatzcollection/update/json?commit=true' -d
> '[{"id":"location_23_deal_51","votes":"23"}]' -H
> 'Content-type:application/json'
>
> And I run the get command I get the wrong result.
>
> curl
> http://54.146.2.60:8983/solr/eatzcollection/get\?id\=location_23_deal_51
>
> {
>   "doc":
>   {
> "id":"location_23_deal_51",
> "deal_id":"deal_51",
> "deal":"$1.99 Kid's Meal with Purchase of Adult Meal",
> "rating":"3",
> "votes":"11",
> "deal_restaurant":"Moe's BBQ",
> "deal_type":"Kids",
> "content_type":"deal",
> "day":"Sunday",
> "_version_":1658287992903565312
>   }
> }
>
>
> I have tried a lot different query string like update/json, /update, but
> they all stop working when I add the commit=true parameter to the query
> string.
>
> Any ideas will be much appreciated!
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Would changing the schema version from 1.5 to 1.6 require a reindex

2020-02-23 Thread Paras Lehana
Hi Karl,

Maybe someone else could help if reindexing is needed if we upgrade Schema
version. However, I guess, useDocValuesAsStored only impacts the query side
assuming docValues had already been stored during indexing. It's actually
easier to try querying the fields after enabling this parameter and see if
that works without reindexing!


On Thu, 13 Feb 2020 at 14:21, Karl Stoney
 wrote:

> Hey,
> I’m going to bump our schema version from 1.5 to 1.6 to get the implicit
> useDocValuesAsStored=true, would this require a reindex?
>
> Thanks
> Karl
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office:
> 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England
> No. 9439967). This email and any files transmitted with it are confidential
> and may be legally privileged, and intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> email in error please notify the sender. This email message has been swept
> for the presence of computer viruses.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: use highlighting on multivalued fields with positionIncrementGap 0

2020-02-23 Thread Paras Lehana
I haven't worked with highlighting much but what's the need to store terms
in multivalued field?

On Fri, 14 Feb 2020 at 20:04, Nicolas Franck 
wrote:

> I'm trying to use highlighting on a multivalued text field (analysis not
> so important) ..
>
>
>   { text: [ "hello", "world" ], id: 1 }
>
> but I want to match across the string boundaries:
>
>   q=text:"hello world"
>
> This works by setting the attribute
> positionIncrementGap to 0, but then the hightlighting entry is empty
>
>   "highlighting": { "1" : { "text" : [] } }
>
> Parameters are:
>
>   hl=true
>   hl.fl=text
>   hl.snippets=50
>   hl.fragSize=1
>
> Any idea why this happens?
> I guess this gap is internal stuff handled by Lucene that Solr doesn't
> know about?
> (as for lucene, there are no multivalued fields!)
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Query Autocomplete Evaluation

2020-02-23 Thread Paras Lehana
Hi Audrey,

I work for Auto-Suggest at IndiaMART. Although we don't use the Suggester
component, I think you need evaluation metrics for Auto-Suggest as a
business product and not specifically for Solr Suggester which is the
backend. We use edismax parser with EdgeNGrams Tokenization.

Every week, as the property owner, I report around 500 metrics. I would
like to mention a few of those:

   1. MRR (Mean Reciprocal Rate): How high the user selection was among the
   returned result. Ranges from 0 to 1, the higher the better.
   2. APL (Average Prefix Length): Prefix is the query by user. Lesser the
   better. This reports how less an average user has to type for getting the
   intended suggestion.
   3. Acceptance Rate or Selection: How many of the total searches are
   being served from Auto-Suggest. We are around 50%.
   4. Selection to Display Ratio: Did you make the user to click any of the
   suggestions if they are displayed?
   5. Response Time: How fast are you serving your average query.


The Selection and Response Time are our main KPIs. We track a lot about
Auto-Suggest usage on our platform which becomes apparent if you observe
the URL after clicking a suggestion on dir.indiamart.com. However, not
everything would benefit you. Do let me know for any related query or
explanation. Hope this helps. :)

On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld - audrey.lorberf...@ibm.com
 wrote:

> Hi all,
>
> How do you all evaluate the success of your query autocomplete (i.e.
> suggester) component if you use it?
>
> We cannot use MRR for various reasons (I can go into them if you're
> interested), so we're thinking of using nDCG since we already use that for
> relevance eval of our system as a whole. I am also interested in the metric
> "success at top-k," but I can't find any research papers that explicitly
> define "success" -- I am assuming it's a suggestion (or suggestions)
> labeled "relevant," but maybe it could also simply be the suggestion that
> receives a click from the user?
>
> Would love to hear from the hive mind!
>
> Best,
> Audrey
>
> --
>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr 7.6 frequent OOM with Java 9, G1 and large heap sizes - any tests with Java 13 and the new ZGC?

2020-02-23 Thread Paras Lehana
Hi,

We are running another 24 hour test with 8GB JVM and so far it is also
> running flawlessly.


If this is the case, as Erick mentioned, the failures were probably due to
long GC pauses. During couple of my stress testings, I had found that
decreasing JVM helps sometimes (it makes GC more frequent and less
intensive in a way). Try with different heap sizes and also consider tuning
GC.

Also, do let us about the performance of ZGC against G1GC - I'm curious.
I'm using Java 11.

On Sun, 23 Feb 2020 at 01:28, tbarkley29  wrote:

> Yes 18% of total physical RAM. The failures in G1GC and CMS setup did seem
> to
> be from pause the world.
>
> We are using Solr Docker image which is using G1GC by default and we tuned
> with G1GC. Even with tuning the performance test failed after about 8
> hours.
> With ZGC we had consistent 12 and 24 hour performance test which ran
> flawlessly.
>
> We are running another 24 hour test with 8GB JVM and so far it is also
> running flawlessly. I will post an update when completed.
>
> Garbage collection is not my area of expertise but so far I am following
> the
> data and out of the box ZGC is performing drastically better.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr Relevancy problem

2020-02-19 Thread Paras Lehana
Hi Pradeep,

I suggest you to at least post an example query, its result and the result
you were expecting. How do you boost your documents?

I guess croma has just started using Solr (I could not find you here
<https://cwiki.apache.org/confluence/display/solr/PublicServers>) and if
that's the case, don't worry - Solr is very powerful and it takes time
initially to arrive at better queries.

On Wed, 19 Feb 2020 at 13:56, Atita Arora  wrote:

> +1 for John's reply.
> Along with that you can try the debug query to see how the query is being
> parsed and what's going wrong.
>
> Hope it helps,
> Atita
>
> On Wed, 19 Feb 2020, 09:19 Jörn Franke,  wrote:
>
> > The best way to address this problem is to collect queries and examples
> > why they are wrong and to document this. This is especially important
> when
> > working with another vendor. Otherwise no one can give you proper help.
> >
> > > Am 19.02.2020 um 09:17 schrieb Pradeep Tambade <
> > pradeep.tamb...@croma.com.invalid>:
> > >
> > > Hello,
> > >
> > > We have configured solr site search engine into our website(
> > www.croma.com). We are facing various issues like not showing relevant
> > results, free text search not showing  result, phrase keywords shows
> > irrelevant results etc
> > >
> > > Please help us resolve these issues also help to connect with solr tech
> > support team or any other company who is expert in managing solr search.
> > >
> > >
> > > Thanks & Regards,
> > > Pradeep Tambade |  Assistant Manager - Business Analyst
> > > Infiniti Retail Ltd. | A Tata Enterprise
> > > Mobile: +91 9664536737
> > > Email: pradeep.tamb...@croma.com | Shop at: www.croma.com
> > >
> > >
> > >  Have e-waste but don't know what to do about it?
> > >
> > >  *   Call us at 7207-666-000 & we pick up your junk at your doorstep
> > >  *   We ensure responsible disposal
> > >  *   And also plant an actual tree in your name for the e-waste you
> > dispose
> > >
> > > [
> > https://www.croma.com/_ui/responsive/common/images/Greatplacetowork_.jpg
> ]
> > >
> > > Registered Office: Unit No. 701 & 702, 7th Floor, Kaledonia, Sahar
> Road,
> > Andheri East, Mumbai - 400069, India
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr grouping with offset

2020-02-14 Thread Paras Lehana
It would be better if you give us an example.

On Fri, 14 Feb 2020 at 17:20, Vadim Ivanov
 wrote:

> Hello guys!
> I need an advise. My task is to delete some documents in collection.
> Del algorithm is following:
> Group docs by field1  with sort by field2 and delete every 3 and following
> occurrences in every group.
> Unfortunately I didn't find easy way to do so.
> Closest approach was to use group.offset = 2, but  result set is polluted
> with empty groups with no documents (they have less then 3 docs in group).
> May be I'm missing smth and there is way not to receive empty groups in
> results?
> Next approach was to use facet first with facet.mincount=3, then find docs
> ids by every facet result  and then delete docs by id.
> That way seems to me  too complicated for the task.
> What's the best use case for the task?
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Performance of Bulk Importing TSV File in Solr 8

2020-01-10 Thread Paras Lehana
Hi Joseph,

Although your indexing rate is fast at around 2800 docs/sec, you can play
with values of autoCommit, mergePolicy and ramBufferSize.

You can post existing values of these to make us comment about those.

As Mikhail suggested, batches can increase performance by committing in
between.

On Fri, 3 Jan 2020 at 02:37, Mikhail Khludnev  wrote:

> Hello, Joseph.
>
> This rate looks good to me, although if the node is idling and  has a
> plenty of free RAM, you can dissect this file by unix tools and submit
> these partitions for import in parallel.
> Hanging connection seems like a bug.
>
> On Thu, Jan 2, 2020 at 10:09 PM Joseph Lorenzini 
> wrote:
>
> > Hi all,
> >
> > I have TSV file that contains 1.2 million rows. I want to bulk import
> this
> > file into solr where each row becomes a solr document. The TSV has 24
> > columns. I am using the streaming API like so:
> >
> > curl -v '
> >
> >
> http://localhost:8983/solr/example/update?stream.file=/opt/solr/results.tsv=%09=%5c=text/csv;charset=utf-8=true
> > '
> >
> > The ingestion rate is 167,000 rows a minute and takes about 7.5 minutes
> to
> > complete. I have a few questions.
> >
> > - is there a way to increase the performance of the ingestion rate? I am
> > open to doing something other than bulk import of a TSV up to and
> including
> > writing a small program. I am just not sure what that would look like at
> a
> > high level.
> > - if the file is a TSV, I noticed that solr never closes a HTTP
> connection
> > with a 200 OK after all the documents are uploaded. The connection seems
> to
> > be held open indefinitely. If however, i upload the same file as a CSV,
> > then solr does close the http connection. Is this a bug?
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Facet Range by Distance generating count of field - need sum of the field

2020-01-10 Thread Paras Lehana
Hi Robert,

How does this work?

{!frange l=0 u=5} sum(geodist())



On Fri, 3 Jan 2020 at 21:10, Robert Scavilla  wrote:

> Thank yo in advance for your help.
>
> I need to get the sum of a pivot range field. The following query uses the
> stats function to sum the *sumField* values. I'm trying to sum the same
> field in a frange subquery and I don't know how.
>
>
> /select?defType=edismax=*:*={!geofilt}=totalResultsUsers,_dist_:geodist(),score=geodist()
>
> desc=true=0=-1=1=value=true=miles=json=dId:193=Coordinates=40.243919,-74.753489=5={!tag=t1}sumField={!stats=t1}startDate=startDate:[2019-12-01
> TO *]={!frange l=0 u=5}geodist()={!frange l=5.001
> u=10}geodist()
>
> This query produces the following results:
>
>   "facet_counts":{
> "facet_queries":{
>   "{!frange l=0 u=5}geodist()":27,
>   "{!frange l=5.001 u=10}geodist()":0},
> "facet_pivot":{
>   "startDate":[{
>   "field":"startDate",
>   "value":"2019-12-01",
>   "count":27,
>   "stats":{
> "stats_fields":{
>   "users":{
> "min":1.0,
> "max":158.0,
> **"count":27,
> "missing":0,
> "sum":488.0,
> "sumOfSquares":40848.0,
>     "mean":18.074074074074073,
> "stddev":35.09758475793535]}},
>
>
> 
>
> What I'm going for is line: "*{!frange l=0 u=5}geodist()":27* should be
> 488.0 which is the sum of the field as opposed to 27 which is the count of
> the field.
>
> Thank you!
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr suggester : duplicate suggestions

2020-01-10 Thread Paras Lehana
Hi Dhanesh,

Although I handle Auto-Suggest, I have worked a little with Suggester
component. Suggester provides results as you type. Do you really need it?

Also, I don't know if I'm correct, but where have you described '&' to be
replaced with 'and'? Is 'and' present in your stopwords list?

I think posting the query and results for both cases of first problem will
help us more.

On Thu, 9 Jan 2020 at 20:50, Dhanesh Radhakrishnan 
wrote:

> Dear all,
> I'm facing two issues with solr suggester component.
>
> *First *
> If I typed "Fire and safety", I'll get the result. But If I type "Fire &
> safety" suggester is not showing
>
> *Second*
> I'm getting duplicate suggestions  in suggester
>
>  "suggest": {
> "categorySuggester": {
> "software": {
> "numFound": 100,
> "suggestions": [
> {
> "term": "Software And Web Development||6070",
> "weight": 0,
> "payload": ""
> },
> {
> "term": "Software And Web Development||6070",
> "weight": 0,
> "payload": ""
> },
> {
> "term": "Software And Web Development||6070",
> "weight": 0,
> "payload": ""
> }
> 
> 
> 
>
> ]
> }
> }
> }
>
>
>
> Here is my configuration
>
> In solrconfig.xml
>
>
> 
> 
> categorySuggester
> AnalyzingInfixLookupFactory
> text_suggest
> DocumentDictionaryFactory
> categoryAutoComplete
>categoryRank
> false
> false
> /dictionary/category
> true
> false
>   
> 
>
>
>
> In schema.xml
>
>  stored="true"  multiValued="true" />
>
>
>  positionIncrementGap="100">
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
>  words="stopwords.txt"/>
>  ignoreCase="true" expand="true"
> tokenizerFactory="solr.KeywordTokenizerFactory"/>
> 
> 
> 
> 
>
>
> http://localhost:8983/solr/core-name/suggest?suggest=true=software=false=categorySuggester=json
>
>  Please help
>
> Thanks & Regards,
> dhanesh s r
>
>
> Dhanesh S.RSenior Technical Leade : dhan...@hifx.co.in  | w :
> www.hifx.in712
>  t   : (+91) 484 4011750
> m : (+91) 994 666 6703
>
> --
> IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its
> content are confidential to the intended recipient. If you are not the
> intended recipient, be advised that you have received this e-mail in error
> and that any use, dissemination, forwarding, printing or copying of this
> e-mail is strictly prohibited. It may not be disclosed to or used by
> anyone
> other than its intended recipient, nor may it be copied in any way. If
> received in error, please email a reply to the sender, then delete it from
> your system.
>
> Although this e-mail has been scanned for viruses, HiFX
> cannot ultimately accept any responsibility for viruses and it is your
> responsibility to scan attachments (if any).
>
> ​Before you print this email
> or attachments, please consider the negative environmental impacts
> associated with printing.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Edismax ignoring queries containing booleans

2020-01-10 Thread Paras Lehana
; > >   "time":0.0},
> > > > "mlt":{
> > > >   "time":0.0},
> > > > "highlight":{
> > > >   "time":0.0},
> > > > "stats":{
> > > >   "time":0.0},
> > > > "expand":{
> > > >   "time":0.0},
> > > > "terms":{
> > > >   "time":0.0},
> > > > "spellcheck":{
> > > >   "time":0.0},
> > > > "debug":{
> > > >   "time":0.0}},
> > > >   "process":{
> > > > "time":38.0,
> > > > "query":{
> > > >   "time":29.0},
> > > > "facet":{
> > > >   "time":0.0},
> > > > "facet_module":{
> > > >   "time":0.0},
> > > > "mlt":{
> > > >   "time":0.0},
> > > > "highlight":{
> > > >   "time":0.0},
> > > > "stats":{
> > > >   "time":0.0},
> > > > "expand":{
> > > >   "time":0.0},
> > > > "terms":{
> > > >   "time":0.0},
> > > > "spellcheck":{
> > > >   "time":6.0},
> > > > "debug":{
> > > >   "time":1.0}
> > > >
> > > > -Original Message-
> > > > From: Edward Ribeiro 
> > > > Sent: 07 January 2020 01:05
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Edismax ignoring queries containing booleans
> > > >
> > > > Hi Claire,
> > > >
> > > > You can add the following parameter `=all` on the URL to bring
> > > > back debugging info and share with us (if you are using the Solr
> admin
> > > > UI you should check the `debugQuery` checkbox).
> > > >
> > > > Also, if you are searching a sequence of values you could perform a
> > > > range
> > > > query: recordID:[18 TO 20]
> > > >
> > > > Best,
> > > > Edward
> > > >
> > > > On Mon, Jan 6, 2020 at 10:46 AM Claire Pollard
> > > > 
> > > > wrote:
> > > > >
> > > > > Ok... It doesn't work for me. I'm fairly new to Solr so any help
> > > > > would be
> > > > appreciated!
> > > > >
> > > > > My managed-schema field and field type look like this:
> > > > >
> > > > >  > > > required="true" multiValued="false" />
> > > > >  > sortMissingLast="true"
> > > > omitNorms="true" />
> > > > >
> > > > > And my solrconfig.xml select/query handlers look like this:
> > > > >
> > > > > 
> > > > > 
> > > > > all
> > > > > 
> > > > > edismax
> > > > > 
> > > > > text^0.4 recordID^10.0
> > > > annotations^0.5 collectionTitle^1.9 collectionDescription^0.9
> > > > title^2.0
> > > > Test_FR^1.0 Test_DE^1.0 Test_AR^1.0 genre^1.0 genre_fr^1.0
> > > > french2^1.0
> > > > > 
> > > > > text
> > > > > *:*
> > > > > 10
> > > > > *,score
> > > > > 
> > > > > text^0.2 recordID^10.0
> > > > annotations^0.6 collectionTitle^2.0 collectionDescription^1.0
> > > > title^2.1
> > > > Test_FR^1.1 Test_DE^1.1 Test_AR^1.1 genre^1.1 genre_fr^1.1
> > > > french2^1.1
> > > > > 
> > > > >01
> > 2-1
> > > > 5-2 690%  
> > > > > 100
> > > > > 
> > > > > text
> > > > > 
> > > > >  > name="spellcheck.dictionary">default
> > > > >  > name="spellcheck.dictionary">wordbreak
> > > > > on
> > > > >  > name="spellcheck.extendedResults">true
> > > > > 10
> > > > >  > > > name="spellcheck.alternativeTermCount">5
> > > > >  > > > name="spellcheck.maxResultsForSuggest">5
> > > > > true
> > > > >  > > > name="spellcheck.collateExtendedResults">true
> > > > >  name="spellcheck.maxCollations">5
> > > > > 
> > > > > 
> > > > > spellcheck
> > > > > 
> > > > > 
> > > > > 
> > > > >
> > > > > 
> > > > > 
> > > > > explicit
> > > > > json
> > > > > true
> > > > > text
> > > > > 
> > > > > 
> > > > >
> > > > > Is there anything else that might be useful in helping diagnose
> > > > > what's
> > > > going wrong for me?
> > > > >
> > > > > Cheers,
> > > > > Claire.
> > > > >
> > > > > -Original Message-
> > > > > From: Saurabh Sharma 
> > > > > Sent: 06 January 2020 11:20
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: Edismax ignoring queries containing booleans
> > > > >
> > > > > It should work well. I have just tested the same with 8.3.0.
> > > > >
> > > > > Thanks
> > > > > Saurabh Sharma
> > > > >
> > > > > On Mon, Jan 6, 2020, 4:31 PM Claire Pollard
> > > > > 
> > > > > wrote:
> > > > >
> > > > > > I'm using:
> > > > > >
> > > > > > recordID:(18 OR 19 OR 20)
> > > > > >
> > > > > > Which should return 2 records (as 18 doesn't exist), but it
> > > > > > returns
> > > > none.
> > > > > > recordID is a LongPointField (sorry I said Int in my previous
> > message).
> > > > > >
> > > > > > -Original Message-
> > > > > > From: Saurabh Sharma 
> > > > > > Sent: 06 January 2020 10:35
> > > > > > To: solr-user@lucene.apache.org
> > > > > > Subject: Re: Edismax ignoring queries containing booleans
> > > > > >
> > > > > > Please share the query which you are creating.
> > > > > >
> > > > > > On Mon, Jan 6, 2020, 3:52 PM Claire Pollard
> > > > > > 
> > > > > > wrote:
> > > > > >
> > > > > > > In Solr 8.3.0 I've got an edismax query parser in my search
> > > > > > > handler, and it seems to be ignoring Boolean operators such as
> > > > > > > AND and OR when searching using an IntPointField.
> > > > > > >
> > > > > > > I was hoping to use a query to this field to return a batch of
> > > > > > > documents with non-sequential IDs, so a range would be
> > inappropriate.
> > > > > > >
> > > > > > > We had a previous 4.10.2 instance of Solr which uses the now
> > > > > > > deprecated Trie fields, and these seem to search without issue
> > > > > > > using
> > > > > > boolean operators.
> > > > > > >
> > > > > > > Is there something extra I need to do with my setup for
> > > > > > > PointFields to use booleans or should they work as default.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Claire.
> > > > > > >
> > > > > >
> > > >
> > > >
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Add Solr files to VCS (GIT)

2020-01-09 Thread Paras Lehana
Hey Erick,

Thanks for your reply.

We have only one production server and one development server both
connected with same git repo (different branches). I'm using Solr in
standalone mode.

On Thu, 9 Jan 2020 at 19:48, Erick Erickson  wrote:

> There’s nothing built in to Solr that will automatically pull files from a
> VCS repo, so it’s a manual process. Which is one of the “gotchas” about
> managed config files, but that’s another rant.
>
> Are you running SolrCloud or stand-alone? If SolrCloud, it doesn’t make
> sense to talk about /var/solr/data/core/conf, you’d need to have one
> copy in VCS and push/pull it to/from ZooKeeper as needed with
> “bin/solr zk upconfig|downconfig”.
>
> As for solr.in.sh, I’d keep a master copy in VCS and, if your setup is
> complicated use something like Puppet or Chef or whatever to
> automate pushing changes to all my servers. Otherwise do it
> manually if you only have a few servers.
>
> Best,
> Erick
>
>
> > On Jan 9, 2020, at 8:19 AM, Paras Lehana 
> wrote:
> >
> > Hi Community,
> >
> > We have just set up a new server with Solr 8.4 on production. Instead of
> > changing files like solrconfig and solr.in.sh by logging on the server,
> we
> > are planning to have some VCS. We have integrated GIT on our server but,
> as
> > other servers, there is a single directory where git files are supposed
> to
> > be uploaded.
> >
> > We have followed "Taking Solr to production" and thus our core conf files
> > live in /var/solr/data/core/conf while solr.in.sh is in /etc/default/.
> >
> > What is your preferred method to integrate VCS for files like these? For
> a
> > workaround, I'm thinking of changing these files with symbolic links that
> > will point to git files. Or we can have an automated process (like rsync)
> > to copy files.
> >
> > Just asking the community how they manage version control for Solr files.
> >
> > --
> > --
> > Regards,
> >
> > *Paras Lehana* [65871]
> > Development Engineer, Auto-Suggest,
> > IndiaMART Intermesh Ltd.
> >
> > 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> > Noida, UP, IN - 201303
> >
> > Mob.: +91-9560911996
> > Work: 01203916600 | Extn:  *8173*
> >
> > --
> > *
> > *
> >
> > <https://www.facebook.com/IndiaMART/videos/578196442936091/>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr 7.5 seed up, accuracy details

2020-01-09 Thread Paras Lehana
I usually keep older and newer Solr versions on different server with same
configuration. Then, use jmeter with same parameters and compare the
results. Make sure settings like environment settings (JVM, GC) and core
configuration (Cache) are in sync and you restart both of them before
testing.

On Sat, 28 Dec 2019 at 21:18, Rajdeep Sahoo 
wrote:

> Thank you for the information
>   Why you are recommending to use the schema api instead of schema xml?
>
>
> On Sat, 28 Dec, 2019, 8:01 PM Jörn Franke,  wrote:
>
> > This highly depends on how you designed your collections etc. - there is
> > no general answer. You have to do a performance test based on your
> > configuration and documents.
> >
> > I also recommend to check the Solr documentation on how to design a
> > collection for 7.x and maybe start even from scratch defining it with a
> new
> > fresh schema (using the schema api instead of schema.xml and
> solrconfig.xml
> > etc). You will have anyway to reindex everything so it is a also a good
> > opportunity to look at your existing processes and optimize them.
> >
> > > Am 28.12.2019 um 15:19 schrieb Rajdeep Sahoo <
> rajdeepsahoo2...@gmail.com
> > >:
> > >
> > > Hi all,
> > > Is there any way I can get the speed up,accuracy details i.e.
> performance
> > > improvements of solr 7.5 in comparison with solr 4.6
> > >  Currently,we are using solr 4.6 and we are in a process to upgrade to
> > > solr 7.5. Need these details.
> > >
> > > Thanks in advance
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Help for importing large data (approx. 8GB) from old solr version to new solr version

2020-01-09 Thread Paras Lehana
Hi Ken,

I also recommend at least reading if not following "Taking Solr to
Production":
https://lucene.apache.org/solr/guide/8_4/taking-solr-to-production.html.

Following this cleared my doubts regarding upgradation and core referencing
while made upgradation very easy and fast.

While starting Solr, you can also define Solr Home (where your older core
lives) by using -s option.



On Wed, 25 Dec 2019 at 21:44, David Hastings  wrote:

> Exactly. Although I’m a bit curious why your going a .1 version up, I
> always wait until an x2, so I won’t be upgrading until 9.3
>
> > On Dec 25, 2019, at 9:45 AM, Erick Erickson 
> wrote:
> >
> > Should work. At any rate, just try it. Since all you’re doing is
> copying data, even if the new installation doesn’t work you still have the
> original.
> >
> >> On Dec 25, 2019, at 1:35 AM, Ken Walker  wrote:
> >>
> >> Hello Erick,
> >>
> >> Thanks for your reply!
> >>
> >> You mean that, we should follow below steps right?
> >> Here is the data directory path :
> >> solr/solr-8.2.0/server/solr/product/item_core/data
> >>
> >> STEPS :-
> >> 1. Stop old solr-8.2.0 server
> >> 2. Copy data directory (from old solr version to new solr version)
> >> copy solr/solr-8.2.1/server/solr/product/item_core/data to
> >> solr/solr-8.3.1/server/solr/product/item_core/data
> >> 3. Start new solr version solr-8.3.1
> >>
> >> Is it correct way to copy just index only from old to new solr version?
> >> Is it lose any data or anything break in new solr version ?
> >>
> >> Thanks in advance!
> >> -Ken
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Add Solr files to VCS (GIT)

2020-01-09 Thread Paras Lehana
Hi Community,

We have just set up a new server with Solr 8.4 on production. Instead of
changing files like solrconfig and solr.in.sh by logging on the server, we
are planning to have some VCS. We have integrated GIT on our server but, as
other servers, there is a single directory where git files are supposed to
be uploaded.

We have followed "Taking Solr to production" and thus our core conf files
live in /var/solr/data/core/conf while solr.in.sh is in /etc/default/.

What is your preferred method to integrate VCS for files like these? For a
workaround, I'm thinking of changing these files with symbolic links that
will point to git files. Or we can have an automated process (like rsync)
to copy files.

Just asking the community how they manage version control for Solr files.

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: how to exclude path from being queried

2019-12-18 Thread Paras Lehana
Hi Nan,

Are you using PathHierarchyTokenizer
<https://lucene.apache.org/solr/guide/8_3/tokenizers.html#Tokenizers-PathHierarchyTokenizer>
?

On Thu, 19 Dec 2019 at 01:51, Nan Yu  wrote:

> Hi,
> I am trying to find all files containing a keyword in a directory (and
> many sub-directories).
>
> I did a quick indexing using
>
> bin/post -c myCore /RootDir
>
> When I query the index using "keyword", all files whose path
> containing the keyword will be included in the search result. For example:
> /RootDir/KeywordReports/FileDoesNotContainKeyword.txt will be shown in the
> query result.
>  The query is: http://localhost:8983/solr/myCore/select?q=keyword
>
> Is there a way to exclude files whose content does not contain the
> keyword but the path contains the keyword?
> Should I re-index the directory using some extra parameter? Or use
> extra condition in the query
>
>
> Thanks!
> Nan
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: updating documents via csv

2019-12-17 Thread Paras Lehana
Oh lol. How could I miss that! This is actually true for any bash command.
Glad that it worked.

On Wed, 18 Dec, 2019, 00:29 rhys J,  wrote:

> On Mon, Dec 16, 2019 at 11:58 PM Paras Lehana 
> wrote:
>
> > Hi Rhys,
> >
> > I use CDATA for XMLs:
> >
> >
> >  
> >
> > There should be a similar solution for JSON though I couldn't find the
> > specific one on the internet. If you are okay to use XMLs for indexing,
> you
> > can use this.
> >
> >
> We are set on using json, but I figured out how to handle the single quote.
>
> If i use curl " and then single quotes inside, I can escape the single
> quote in the field with no problem.
>
> Thanks for the help!
>
> Rhys
>

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Starting Solr automatically

2019-12-16 Thread Paras Lehana
Hi Anuj,

Firstly, you should be checking into the logs for the reason of Solr
getting stopped. We had started Solr since a year ago and it's still up. I
guess OOM in your case.

Secondly, there are many ways to restart solr. For example, if it's
registered as a service, make a cron to restart solr whenever it's not
running.

But as I have said before, please look for the reason of solr getting
stopped.

On Tue, 17 Dec 2019 at 10:18, Anuj Bhargava  wrote:

> Often solr stops working. We have to then go to the root directory and give
> the command *'service solr start*'
>
> Is there a way to automatically start solr when it stops.
>
> Regards,
> Anuj
>
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: updating documents via csv

2019-12-16 Thread Paras Lehana
Hi Rhys,

I use CDATA for XMLs:

   
 

There should be a similar solution for JSON though I couldn't find the
specific one on the internet. If you are okay to use XMLs for indexing, you
can use this.

On Tue, 17 Dec 2019 at 01:40, rhys J  wrote:

> Is there a way to update documents already stored in the solr cores via
> csv?
>
> The reason I am asking is because I am running into a problem with updating
> via script with single quotes embedded into the field itself.
>
> Example:
>
> curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ "id":
> "356767", "name1": {"set": "NORTH AMERICAN INT'L"},"name2": {"set": " "}}]'
>
> I have tried the following as well:
>
> curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ "id":
> "356767", "name1": {"set": "NORTH AMERICAN INT\'L"},"name2": {"set": "
> "}}]'
>
> curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ "id":
> "356767", "name1": {"set": "NORTH AMERICAN INT\\'L"},"name2": {"set": "
> "}}]'
>
> curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ \\"id\\":
> \\"356767\\", \\"name1\\": {\\"set\\": \\"NORTH AMERICAN INT\\'L\\"},}]'
>
> All of these break on the single quote embedded in field name1.
>
> Does anyone have any ideas as to what I can do to get around this?
>
> I will also eventually need to get around having an & inside a field too,
> but that hasn't come up yet.
>
> Thanks,
>
> Rhys
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: unable to update using empty strings or 'null' in value

2019-12-15 Thread Paras Lehana
etty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:505)\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)\n\tat
>
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)\n\tat
> org.eclipse.jetty.io
> .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\n\tat
> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat
>
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\tat
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:781)\n\tat
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:917)\n\tat
> java.base/java.lang.Thread.run(Thread.java:835)\n",
> "code":500}}
>
> ___
>
> Near as I can tell, it's complaining about the null value?
>
> I did this, because of the instructions I found here:
>
> https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html
>
> Where it says if you want to remove a value from the index, use 'null' to
> take care of empty strings?
>
> Has anyone seen this problem?
>
> Thanks,
>
> Rhys
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Atomic solrj update

2019-12-15 Thread Paras Lehana
Hi Prem,

Using HTTPClient to establish connection and also i am *validating* whether
> the particular document is *available* in collection or not and after that
> updating the document.


 Why do you need to validate the particular document before updating.
Atomic updates either update the document if it's already available or
create the document if it's not. I guess you don't want to create the
document if it doesn't exist, right?



On Fri, 13 Dec 2019 at 11:42, Shawn Heisey  wrote:

> On 12/12/2019 10:00 PM, Prem wrote:
> > I am trying to partially update of 50M data in a collection from CSV
> using
> > Atomic script(solrj).But it is taking 2 hrs for 1M records.is there
> anyway i
> > can speed up my update.
>
> How many documents are you sending in one request?
>
> > Using HTTPClient to establish connection and also i am validating whether
> > the particular document is available in collection or not and after that
> > updating the document.
>
> I thought you were using SolrJ ... but here you say you're using
> HTTPClient.
>
> Can you share your code?  What Solr server version? If you're using
> SolrJ, what version of that?
>
> If your program checks whether every single document already exists
> before sending an update, that is going to be quite slow.
>
> Thanks,
> Shawn
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: backing up and restoring

2019-12-15 Thread Paras Lehana
Looks like a write lock. Did reloading the core fix that? I guess it would
have been fixed by now. I guess you had run the delete query few moments
after restoring, no?

On Thu, 12 Dec 2019 at 21:55, rhys J  wrote:

> I was able to successfully restore a backup by specifying name and location
> in the restore command.
>
> But now when i try to run:
>
> sudo -u solr curl http://localhost:8983/solr/debt/update -H "Content-type:
> text/xml" --data-binary '*:*'
>
> I get the following error:
>
>  no segments* file found in
> LockValidatingDirectoryWrapper(NRTCachingDirectory(MMapDirectory@
> /var/solr/data/debt/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@4746f577;
> maxCacheMB=48.0 maxMergeSizeMB=4.0)): files: [write.lock]
>   org.apache.lucene.index.IndexNotFoundException: no
> segments* file found in
> LockValidatingDirectoryWrapper(NRTCachingDirectory(MMapDirectory@
> /var/solr/data/debt/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@4746f577;
> maxCacheMB=48.0 maxMergeSizeMB=4.0)): files: [write.lock]
>
> I am just copying the top portion of the error, as it is very long.
>
> What did I do wrong?
>
> Thanks,
>
> Rhys
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: [Q] Faster Atomic Updates - use docValues?

2019-12-11 Thread Paras Lehana
Hi Erick,

You're right - IO was extraordinarily high. But something odd happened. To
actually build a relation, I tried different heap sizes with default
solrconfig.xml values as you recommended.

   1. Increased RAM to 4G, speed 8500k.
   2. Decreased to 2G, back to old 65k.
   3. Increased back to 4G, speed 50k
   4. Decreased to 3G, speed 50k
   5. Increased to 10G, speed 8500k.

The speed is 1 min average after the indexing is started. With last 10G, as
(maybe) expected, I got java.lang.NullPointerException at
org.apache.solr.handler.component.RealTimeGetComponent.getInputDocument
before committing. I'm not getting the faster speeds with any of the heap
sizes now. I will continue digging in deeper and in the meantime, I will be
getting the 24G RAM. Currently giving Solr 6G heap (speed is 55k - too
low).

After making the progress, this may be a step backward but I do believe I
will take 2 steps forward soon. All credits to you. Getting into GC logs
now. I'm a newbie here - know about GC theory but have never analyzed
those. What tool do you prefer? I'm planning to use GCeasy for uploading
the solr current gc log.

On Wed, 11 Dec 2019 at 18:21, Erick Erickson 
wrote:

> I doubt GC alone would make nearly that difference. More likely
> it’s I/O interacting with MMapDirectory. Lucene uses OS memory
> space for much of its index, i.e. the RAM left over
> after that used for the running Solr process (and any other
> processes of course). See:
>
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> So if you, you don’t leave much OS memory space for Lucene’s
> use via MMap, that can lead to swapping. My bet is that was
> what was happening, and your CPU utilization was low; Lucene and
> thus Solr was spending all its time waiting around for I/O. If that theory
> is true, your disk I/O should have been much higher before you reduced
> your heap.
>
> IOW, I claim if you left the java heap at 12G and increased the physical
> memory to 24G you’d see an identical (or nearly) speedup. GC for a 12G
> heap is rarely a bottleneck. That said you want to use as little heap for
> your Java process as possible, but if you reduce it too much you wind up
> with other problems. OOM for one, and I’ve also seen GC take an inordinate
> amount of time when it’s _barely_ enough to run. You hit a GC that
> recovers,
> say, 10M of heap which is barely enough to continue for a few milliseconds
> and hits another GC….. As you can tell, “this is more art than science”…
>
> Glad to hear you’re making progress!
> Erick
>
> > On Dec 11, 2019, at 5:06 AM, Paras Lehana 
> wrote:
> >
> > Just to update, I kept the defaults. The indexing got only a little boost
> > though I have decided to continue with the defaults and do incremental
> > experiments only. To my surprise, our development server had only 12GB
> RAM,
> > of which 8G was allocated to Java. Because I could not increase the RAM,
> I
> > tried decreasing it to 4G and guess what! My indexing speed got a boost
> of
> > over *50x*. Erick, thanks for helping. I think I should do more homework
> > about GCs also. Your GC guess seems to be valid. I have raised the
> request
> > to increase RAM on the development to 24GB.
> >
> > On Mon, 9 Dec 2019 at 20:23, Erick Erickson 
> wrote:
> >
> >> Note that that article is from 2011. That was in the Solr 3x days when
> >> many, many, many things were different. There was no SolrCloud for
> >> instance. Plus Tom’s problem space is indexing _books_. Whole, complete,
> >> books. Which is, actually, not “normal” indexing at all as most Solr
> >> indexes are much smaller documents. Books are a perfectly reasonable
> >> use-case of course, but have a whole bunch of special requirements.
> >>
> >> get-by-id should be very efficient, _except_ that the longer you spend
> >> before opening a new searcher, the larger the internal data buffers
> >> supporting get-by-id need to be.
> >>
> >> Anyway, best of luck
> >> Erick
> >>
> >>> On Dec 9, 2019, at 1:05 AM, Paras Lehana 
> >> wrote:
> >>>
> >>> Hi Erick,
> >>>
> >>> I have reverted back to original values and yes, I did see
> improvement. I
> >>> will collect more stats. *Thank you for helping. :)*
> >>>
> >>> Also, here is the reference article that I had referred for changing
> >>> values:
> >>>
> >>
> https://www.hathitrust.org/blogs/large-scale-search/forty-days-and-forty-nights-re-indexing-7-million-books-part-1
> >>>
> >>> The article was perhaps for normal indexing and thus, suggested
> &

Re: [Q] Faster Atomic Updates - use docValues?

2019-12-11 Thread Paras Lehana
Just to update, I kept the defaults. The indexing got only a little boost
though I have decided to continue with the defaults and do incremental
experiments only. To my surprise, our development server had only 12GB RAM,
of which 8G was allocated to Java. Because I could not increase the RAM, I
tried decreasing it to 4G and guess what! My indexing speed got a boost of
over *50x*. Erick, thanks for helping. I think I should do more homework
about GCs also. Your GC guess seems to be valid. I have raised the request
to increase RAM on the development to 24GB.

On Mon, 9 Dec 2019 at 20:23, Erick Erickson  wrote:

> Note that that article is from 2011. That was in the Solr 3x days when
> many, many, many things were different. There was no SolrCloud for
> instance. Plus Tom’s problem space is indexing _books_. Whole, complete,
> books. Which is, actually, not “normal” indexing at all as most Solr
> indexes are much smaller documents. Books are a perfectly reasonable
> use-case of course, but have a whole bunch of special requirements.
>
> get-by-id should be very efficient, _except_ that the longer you spend
> before opening a new searcher, the larger the internal data buffers
> supporting get-by-id need to be.
>
> Anyway, best of luck
> Erick
>
> > On Dec 9, 2019, at 1:05 AM, Paras Lehana 
> wrote:
> >
> > Hi Erick,
> >
> > I have reverted back to original values and yes, I did see improvement. I
> > will collect more stats. *Thank you for helping. :)*
> >
> > Also, here is the reference article that I had referred for changing
> > values:
> >
> https://www.hathitrust.org/blogs/large-scale-search/forty-days-and-forty-nights-re-indexing-7-million-books-part-1
> >
> > The article was perhaps for normal indexing and thus, suggested
> increasing
> > mergeFactor and then finally optimizing. In my case, a large number of
> > segments could have impacted get-by-id of atomic updates? Just being
> > curious.
> >
> > On Fri, 6 Dec 2019 at 19:02, Paras Lehana 
> > wrote:
> >
> >> Hey Erick,
> >>
> >> We have just upgraded to 8.3 before starting the indexing. We were on
> 6.6
> >> before that.
> >>
> >> Thank you for your continued support and resources. Again, I have
> already
> >> taken your suggestion to start afresh and that's what I'm going to do.
> >> Don't get me wrong but I have been just asking doubts. I will surely get
> >> back with my experience after performing the full indexing.
> >>
> >> Thanks again! :)
> >>
> >> On Fri, 6 Dec 2019 at 18:48, Erick Erickson 
> >> wrote:
> >>
> >>> Nothing implicitly handles optimization, you must continue to do that
> >>> externally.
> >>>
> >>> Until you get to the bottom of your indexing slowdown, I wouldn’t
> bother
> >>> with it at all, trying to do all these things at once is what lead to
> your
> >>> problem in the first place, please change one thing at a time. You say:
> >>>
> >>> “For a full indexing, optimizations occurred 30 times between batches”.
> >>>
> >>> This is horrible. I’m not sure what version of Solr you’re using. If
> it’s
> >>> 7.4 or earlier, this means the the entire index was rewritten 30 times.
> >>> The first time it would condense all segments into a single segment, or
> >>> 1/30 of the total. The second time it would rewrite all that, 2/30 of
> the
> >>> index into a new segment. The third time 3/30. And so on.
> >>>
> >>> If Solr 7.5 or later, it wouldn’t be as bad, assuming your index was
> over
> >>> 5G. But still.
> >>>
> >>> See:
> >>>
> https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
> >>> for 7.4 and earlier,
> >>> https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
> for
> >>> 7.5 and later
> >>>
> >>> Eventually you can optimize by sending in an http or curl request like
> >>> this:
> >>> ../solr/collection/update?optimize=true
> >>>
> >>> You also changed to using StandardDirectory. The default has heuristics
> >>> built in
> >>> to choose the best directory implementation.
> >>>
> >>> I can’t emphasize enough that you’re changing lots of things at one
> time.
> >>> I
> >>> _strongly_ urge you to go back to the standard setup, make _no_
> >>> modifications
> >>> and change things one at a time. Some very bright people

Re: Search returning unexpected matches at the top

2019-12-09 Thread Paras Lehana
That's great.

But I also wanted to know why the concerned document was scored lower in
the original query. Anyways, glad that the issue is resolved. :)

On Tue, 10 Dec 2019 at 00:38, rhys J  wrote:

> On Mon, Dec 9, 2019 at 12:06 AM Paras Lehana 
> wrote:
>
> > Hi Rhys,
> >
> > Use Solr Query Debugger
> > <
> >
> https://chrome.google.com/webstore/detail/solr-query-debugger/gmpkeiamnmccifccnbfljffkcnacmmdl?hl=en
> > >
> > Chrome
> > Extension to see what's making up the score for both of them. I guess
> > fieldNorm should impact but that should not be the only thing - there's
> > another catch here.
> >
>
> Oh wow, thank you for this!
>
> I figured out that if I added quotes to the terms, and then added ^2 to the
> score, that it floated to the top just like I expected.
>
> Thanks,
>
> Rhys
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Edismax bq(boost query) with filter range on score

2019-12-09 Thread Paras Lehana
I don't know if this inclusive works though I know that incl is for
including the lower bound and incu is for including the upper bound.

On Mon, 9 Dec 2019 at 16:49, Raboah, Avi  wrote:

> Thanks for your fast response!
>
> Without the frange I get all the documents with the score field from 1.0
> (default score) to max score after boosting.
>
> When I add the frange for example
> bq=text:"Phrase"^3=edismax=*,score={!frange  l=0 u=3
> inclusive=true}query($bq)=*:*=2000
>
> I get all the documents with score between (1.0 < score < 4.0) although
> lower bound equal to 0.
>
> Thanks.
>
> -Original Message-
> From: Paras Lehana [mailto:paras.leh...@indiamart.com]
> Sent: Monday, December 09, 2019 11:51 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Edismax bq(boost query) with filter range on score
>
> I was just going to suggest you frange. You're already using it.
>
> Please post the whole query. Have you confirmed that by removing the
> frange, you are able to see the documents with score=1.0.
>
> On Mon, 9 Dec 2019 at 14:21, Raboah, Avi  wrote:
>
> > That's right,
> >
> > I check something like this fq={!frange l=0 u=5}query($bq)
> >
> > And it's partially work but it's not return the documents with score =
> > 1.0
> >
> > Do you know why?
> >
> >
> > Thanks.
> > -Original Message-
> > From: Paras Lehana [mailto:paras.leh...@indiamart.com]
> > Sent: Monday, December 09, 2019 7:08 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Edismax bq(boost query) with filter range on score
> >
> > Hi Raboah,
> >
> > What do you mean by filter range? Please post expected result. Do you
> > want to put an fq on the score?
> >
> > On Sun, 8 Dec 2019 at 17:54, Raboah, Avi  wrote:
> >
> > > Hi,
> > >
> > > In order to use solr boost mechanism for specific text I use the bq
> > > field under deftype=edisMax.
> > >
> > > For example -
> > > q=*:*=edisMax=text:"phrase"^3=*,score
> > >
> > > after I do this query I get the relevant documents boosted with the
> > > solr calculation score.
> > > Now my question is there a way to do a filter range on the score?
> > >
> > > Thanks.
> > >
> > >
> > >
> > >
> > >
> > > This electronic message may contain proprietary and confidential
> > > information of Verint Systems Inc., its affiliates and/or
> > > subsidiaries. The information is intended to be for the use of the
> > > individual(s) or
> > > entity(ies) named above. If you are not the intended recipient (or
> > > authorized to receive this e-mail for the intended recipient), you
> > > may not use, copy, disclose or distribute to anyone this message or
> > > any information contained in this message. If you have received this
> > > electronic message in error, please notify us by replying to this
> e-mail.
> > >
> >
> >
> > --
> > --
> > Regards,
> >
> > *Paras Lehana* [65871]
> > Development Engineer, Auto-Suggest,
> > IndiaMART Intermesh Ltd.
> >
> > 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, Noida, UP,
> > IN
> > - 201303
> >
> > Mob.: +91-9560911996
> > Work: 01203916600 | Extn:  *8173*
> >
> > --
> > *
> > *
> >
> >  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
> >
> >
> > This electronic message may contain proprietary and confidential
> > information of Verint Systems Inc., its affiliates and/or
> > subsidiaries. The information is intended to be for the use of the
> > individual(s) or
> > entity(ies) named above. If you are not the intended recipient (or
> > authorized to receive this e-mail for the intended recipient), you may
> > not use, copy, disclose or distribute to anyone this message or any
> > information contained in this message. If you have received this
> > electronic message in error, please notify us by replying to this e-mail.
> >
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, Noida, UP, IN
> - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
> --
> *
> *
>
>  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr 8 - Sort Issue

2019-12-09 Thread Paras Lehana
Hi Anuj,

Glad that it worked. I ask for schema screenshot everyone because I'm
mostly sure of the schema not being reloaded or something.

However, I change pint to plong because it was taking an awful lot of time
> to index.


Strange! Why do you think that this is the case?

On Mon, 9 Dec 2019 at 14:06, Anuj Bhargava  wrote:

> Thanks Paras, that was very helpful. Restarted solr and for posting_id it
> showed pint earlier it was showing string.
>
> However, I change pint to plong because it was taking an awful lot of time
> to index.
>
> Thanks again,
>
> Regards,
> ANuj
>
> On Mon, 9 Dec 2019 at 11:32, Paras Lehana 
> wrote:
>
> > Hi Anuj,
> >
> > Thanks for that.
> >
> >1. Go to Schema (left side section) > choose your field posting_id and
> >post the screenshot. Are you able to see IntPointField or pint there?.
> >2. Please post the query you are using for sorting. Also, post a
> sample
> >of response.
> >
> > --
> > *
> > *
> >
> >  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Edismax bq(boost query) with filter range on score

2019-12-09 Thread Paras Lehana
I was just going to suggest you frange. You're already using it.

Please post the whole query. Have you confirmed that by removing the
frange, you are able to see the documents with score=1.0.

On Mon, 9 Dec 2019 at 14:21, Raboah, Avi  wrote:

> That's right,
>
> I check something like this fq={!frange l=0 u=5}query($bq)
>
> And it's partially work but it's not return the documents with score = 1.0
>
> Do you know why?
>
>
> Thanks.
> -----Original Message-
> From: Paras Lehana [mailto:paras.leh...@indiamart.com]
> Sent: Monday, December 09, 2019 7:08 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Edismax bq(boost query) with filter range on score
>
> Hi Raboah,
>
> What do you mean by filter range? Please post expected result. Do you want
> to put an fq on the score?
>
> On Sun, 8 Dec 2019 at 17:54, Raboah, Avi  wrote:
>
> > Hi,
> >
> > In order to use solr boost mechanism for specific text I use the bq
> > field under deftype=edisMax.
> >
> > For example -
> > q=*:*=edisMax=text:"phrase"^3=*,score
> >
> > after I do this query I get the relevant documents boosted with the
> > solr calculation score.
> > Now my question is there a way to do a filter range on the score?
> >
> > Thanks.
> >
> >
> >
> >
> >
> > This electronic message may contain proprietary and confidential
> > information of Verint Systems Inc., its affiliates and/or
> > subsidiaries. The information is intended to be for the use of the
> > individual(s) or
> > entity(ies) named above. If you are not the intended recipient (or
> > authorized to receive this e-mail for the intended recipient), you may
> > not use, copy, disclose or distribute to anyone this message or any
> > information contained in this message. If you have received this
> > electronic message in error, please notify us by replying to this e-mail.
> >
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, Noida, UP, IN
> - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
> --
> *
> *
>
>  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: [Q] Faster Atomic Updates - use docValues?

2019-12-08 Thread Paras Lehana
Hi Erick,

I have reverted back to original values and yes, I did see improvement. I
will collect more stats. *Thank you for helping. :)*

Also, here is the reference article that I had referred for changing
values:
https://www.hathitrust.org/blogs/large-scale-search/forty-days-and-forty-nights-re-indexing-7-million-books-part-1

The article was perhaps for normal indexing and thus, suggested increasing
mergeFactor and then finally optimizing. In my case, a large number of
segments could have impacted get-by-id of atomic updates? Just being
curious.

On Fri, 6 Dec 2019 at 19:02, Paras Lehana 
wrote:

> Hey Erick,
>
> We have just upgraded to 8.3 before starting the indexing. We were on 6.6
> before that.
>
> Thank you for your continued support and resources. Again, I have already
> taken your suggestion to start afresh and that's what I'm going to do.
> Don't get me wrong but I have been just asking doubts. I will surely get
> back with my experience after performing the full indexing.
>
> Thanks again! :)
>
> On Fri, 6 Dec 2019 at 18:48, Erick Erickson 
> wrote:
>
>> Nothing implicitly handles optimization, you must continue to do that
>> externally.
>>
>> Until you get to the bottom of your indexing slowdown, I wouldn’t bother
>> with it at all, trying to do all these things at once is what lead to your
>> problem in the first place, please change one thing at a time. You say:
>>
>> “For a full indexing, optimizations occurred 30 times between batches”.
>>
>> This is horrible. I’m not sure what version of Solr you’re using. If it’s
>> 7.4 or earlier, this means the the entire index was rewritten 30 times.
>> The first time it would condense all segments into a single segment, or
>> 1/30 of the total. The second time it would rewrite all that, 2/30 of the
>> index into a new segment. The third time 3/30. And so on.
>>
>> If Solr 7.5 or later, it wouldn’t be as bad, assuming your index was over
>> 5G. But still.
>>
>> See:
>> https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
>> for 7.4 and earlier,
>> https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/ for
>> 7.5 and later
>>
>> Eventually you can optimize by sending in an http or curl request like
>> this:
>> ../solr/collection/update?optimize=true
>>
>> You also changed to using StandardDirectory. The default has heuristics
>> built in
>> to choose the best directory implementation.
>>
>> I can’t emphasize enough that you’re changing lots of things at one time.
>> I
>> _strongly_ urge you to go back to the standard setup, make _no_
>> modifications
>> and change things one at a time. Some very bright people have done a lot
>> of work to try to make Lucene/Solr work well.
>>
>> Make one change at a time. Measure. If that change isn’t helpful, undo it
>> and
>> move to the next one. You’re trying to second-guess the Lucene/Solr
>> developers who have years of understanding how this all works. Assume they
>> picked reasonable options for defaults and that Lucene/Solr performs
>> reasonably
>> well. When I get unexplainably poor results, I usually assume it was the
>> last
>> thing I changed….
>>
>> Best,
>> Erick
>>
>>
>>
>>
>> > On Dec 6, 2019, at 1:31 AM, Paras Lehana 
>> wrote:
>> >
>> > Hi Erick,
>> >
>> > I believed optimizing explicitly merges segments and that's why I was
>> > expecting it to give performance boost. I know that optimizations should
>> > not be done very frequently. For a full indexing, optimizations
>> occurred 30
>> > times between batches. I take your suggestion to undo all the changes
>> and
>> > that's what I'm going to do. I mentioned about the optimizations giving
>> an
>> > indexing boost (for sometime) only to support your point of my
>> mergePolicy
>> > backfiring. I will certainly read again about the merge process.
>> >
>> > Taking your suggestions - so, commits would be handled by autoCommit.
>> What
>> > implicitly handles optimizations? I think the merge policy or is there
>> any
>> > other setting I'm missing?
>> >
>> > I'm indexing via Curl API on the same server. The Current Speed of curl
>> is
>> > only 50k (down from 1300k in the first batch). I think - as the curl is
>> > transmitting the XML, the documents are getting indexing. Because then
>> only
>> > would speed be so low. I don't think that the whole XML is taking the
>> > memory - I r

Re: Solr 8 - Sort Issue

2019-12-08 Thread Paras Lehana
Hi Anuj,

Thanks for that.

   1. Go to Schema (left side section) > choose your field posting_id and
   post the screenshot. Are you able to see IntPointField or pint there?.
   2. Please post the query you are using for sorting. Also, post a sample
   of response.

-- 
*
*

 


Re: Edismax bq(boost query) with filter range on score

2019-12-08 Thread Paras Lehana
Hi Raboah,

What do you mean by filter range? Please post expected result. Do you want
to put an fq on the score?

On Sun, 8 Dec 2019 at 17:54, Raboah, Avi  wrote:

> Hi,
>
> In order to use solr boost mechanism for specific text I use the bq field
> under deftype=edisMax.
>
> For example -
> q=*:*=edisMax=text:"phrase"^3=*,score
>
> after I do this query I get the relevant documents boosted with the solr
> calculation score.
> Now my question is there a way to do a filter range on the score?
>
> Thanks.
>
>
>
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr 8 - Sort Issue

2019-12-08 Thread Paras Lehana
Hi Anuj,

Please post the part of your schema for this field and its fieldType.

On Sat, 7 Dec 2019 at 10:38, Anuj Bhargava  wrote:

> Tried with plong, pint and string. Reindexed after each change. The sort
> results on numeric values being shown are the same -
> Ascending
> 1
> 10
> 100
> 2
> 2100
> 3
>
> Descending
> 
> 999
> 99
> 9
> 88
> 88
> 
>
> On Fri, 6 Dec 2019 at 17:15, Anuj Bhargava  wrote:
>
> > Numeric sorting. Did the re-indexing. But didn't work.
> >
> > Regards,
> >
> > Anuj
> >
> > On Fri, 6 Dec 2019 at 16:44, Munendra S N 
> wrote:
> >
> >> >
> >> > What should I use for numeric search.
> >>
> >> numeric search or numeric sorting?
> >>
> >>  I tried with pint also, but the result was the same.
> >>
> >> It should have worked. please make sure data is reindexed after
> fieldType
> >> changes
> >>
> >> Regards,
> >> Munendra S N
> >>
> >>
> >>
> >> On Fri, Dec 6, 2019 at 4:10 PM Anuj Bhargava 
> wrote:
> >>
> >> > I tried with pint also, but the result was the same. What should I use
> >> for
> >> > numeric search.
> >> >
> >> > Regards,
> >> >
> >> > Anuj
> >> >
> >> > On Fri, 6 Dec 2019 at 15:55, Munendra S N 
> >> wrote:
> >> >
> >> > > Hi Anuj,
> >> > > As the field type is String, lexicographical sorting is done, not
> >> numeric
> >> > > sorting.
> >> > >
> >> > > Regards,
> >> > > Munendra S N
> >> > >
> >> > >
> >> > >
> >> > > On Fri, Dec 6, 2019 at 3:12 PM Anuj Bhargava 
> >> wrote:
> >> > >
> >> > > > When I sort desc on posting_id sort=posting_id%20desc, I get the
> >> > > following
> >> > > > result
> >> > > > "posting_id":"313"
> >> > > > "posting_id":"312"
> >> > > > "posting_id":"310"
> >> > > >
> >> > > > When I sort asc on posting_id sort=posting_id%20asc, I get the
> >> > following
> >> > > > result
> >> > > > "posting_id":"10005343"
> >> > > > "posting_id":"10005349"
> >> > > > "posting_id":"10005359"
> >> > > >
> >> > > > *In descending the 8 figure numbers are not coming up first and in
> >> > > > ascending the 7 figure numbers are not coming up first.*
> >> > > >
> >> > > > Entry in schema is -
> >> > > >  stored="true"
> >> > > > required="true" docValues="true" multiValued="false"/>
> >> > > >
> >> > >
> >> >
> >>
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Search returning unexpected matches at the top

2019-12-08 Thread Paras Lehana
Hi Rhys,

Use Solr Query Debugger
<https://chrome.google.com/webstore/detail/solr-query-debugger/gmpkeiamnmccifccnbfljffkcnacmmdl?hl=en>
Chrome
Extension to see what's making up the score for both of them. I guess
fieldNorm should impact but that should not be the only thing - there's
another catch here.

On Fri, 6 Dec 2019 at 22:00, Erick Erickson  wrote:

> Please look at the admin UI>>collection_or_core>>analysis page. That will
> tell you exactly how your input is being transformed. Very often
> WordDelimiter(Graph)FilterFactory is what breaks data up like this, that’s
> what it’s _designed_ for.
>
> Best,
> Erick
>
> > On Dec 6, 2019, at 11:25 AM, rhys J  wrote:
> >
> > On Fri, Dec 6, 2019 at 11:21 AM David Hastings 
> wrote:
> >
> >> whats the field type for:
> >> clt_ref_no
> >>
> >
> > It is a text_general field because it can have numbers or alphanumeric
> > characters.
> >
> > *_no isnt a default dynamic character, and owl-2924-8 usually translates
> >> into
> >> owl 2924 8
> >>
> >>
> > So it's matching on word breaks, am I understanding properly?
> >
> > It's matching all things that match either 'owl' or '2924' or '8'?
> >
> > Thanks,
> >
> > Rhys
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: [Q] Faster Atomic Updates - use docValues?

2019-12-06 Thread Paras Lehana
Hey Erick,

We have just upgraded to 8.3 before starting the indexing. We were on 6.6
before that.

Thank you for your continued support and resources. Again, I have already
taken your suggestion to start afresh and that's what I'm going to do.
Don't get me wrong but I have been just asking doubts. I will surely get
back with my experience after performing the full indexing.

Thanks again! :)

On Fri, 6 Dec 2019 at 18:48, Erick Erickson  wrote:

> Nothing implicitly handles optimization, you must continue to do that
> externally.
>
> Until you get to the bottom of your indexing slowdown, I wouldn’t bother
> with it at all, trying to do all these things at once is what lead to your
> problem in the first place, please change one thing at a time. You say:
>
> “For a full indexing, optimizations occurred 30 times between batches”.
>
> This is horrible. I’m not sure what version of Solr you’re using. If it’s
> 7.4 or earlier, this means the the entire index was rewritten 30 times.
> The first time it would condense all segments into a single segment, or
> 1/30 of the total. The second time it would rewrite all that, 2/30 of the
> index into a new segment. The third time 3/30. And so on.
>
> If Solr 7.5 or later, it wouldn’t be as bad, assuming your index was over
> 5G. But still.
>
> See:
> https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
> for 7.4 and earlier,
> https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/ for
> 7.5 and later
>
> Eventually you can optimize by sending in an http or curl request like
> this:
> ../solr/collection/update?optimize=true
>
> You also changed to using StandardDirectory. The default has heuristics
> built in
> to choose the best directory implementation.
>
> I can’t emphasize enough that you’re changing lots of things at one time. I
> _strongly_ urge you to go back to the standard setup, make _no_
> modifications
> and change things one at a time. Some very bright people have done a lot
> of work to try to make Lucene/Solr work well.
>
> Make one change at a time. Measure. If that change isn’t helpful, undo it
> and
> move to the next one. You’re trying to second-guess the Lucene/Solr
> developers who have years of understanding how this all works. Assume they
> picked reasonable options for defaults and that Lucene/Solr performs
> reasonably
> well. When I get unexplainably poor results, I usually assume it was the
> last
> thing I changed….
>
> Best,
> Erick
>
>
>
>
> > On Dec 6, 2019, at 1:31 AM, Paras Lehana 
> wrote:
> >
> > Hi Erick,
> >
> > I believed optimizing explicitly merges segments and that's why I was
> > expecting it to give performance boost. I know that optimizations should
> > not be done very frequently. For a full indexing, optimizations occurred
> 30
> > times between batches. I take your suggestion to undo all the changes and
> > that's what I'm going to do. I mentioned about the optimizations giving
> an
> > indexing boost (for sometime) only to support your point of my
> mergePolicy
> > backfiring. I will certainly read again about the merge process.
> >
> > Taking your suggestions - so, commits would be handled by autoCommit.
> What
> > implicitly handles optimizations? I think the merge policy or is there
> any
> > other setting I'm missing?
> >
> > I'm indexing via Curl API on the same server. The Current Speed of curl
> is
> > only 50k (down from 1300k in the first batch). I think - as the curl is
> > transmitting the XML, the documents are getting indexing. Because then
> only
> > would speed be so low. I don't think that the whole XML is taking the
> > memory - I remember I had to change the curl options to get rid of the
> > transmission error for large files.
> >
> > This is my curl request:
> >
> > curl 'http://localhost:$port/solr/product/update?commit=true'  -T
> > batch1.xml -X POST -H 'Content-type:text/xml
> >
> > Although, we had been doing this since ages - I think I should now
> consider
> > using the solr post service (since the indexing files stays on the same
> > server) or using Solarium (we use PHP to make XMLs).
> >
> > On Thu, 5 Dec 2019 at 20:00, Erick Erickson 
> wrote:
> >
> >>> I think I should have also done optimize between batches, no?
> >>
> >> No, no, no, no. Absolutely not. Never. Never, never, never between
> batches.
> >> I don’t  recommend optimizing at _all_ unless there are demonstrable
> >> improvements.
> >>
> >> Please don’t take this the wrong way, the whole merge process is really
> >> h

Re: [Q] Faster Atomic Updates - use docValues?

2019-12-05 Thread Paras Lehana
Hi Erick,

I believed optimizing explicitly merges segments and that's why I was
expecting it to give performance boost. I know that optimizations should
not be done very frequently. For a full indexing, optimizations occurred 30
times between batches. I take your suggestion to undo all the changes and
that's what I'm going to do. I mentioned about the optimizations giving an
indexing boost (for sometime) only to support your point of my mergePolicy
backfiring. I will certainly read again about the merge process.

Taking your suggestions - so, commits would be handled by autoCommit. What
implicitly handles optimizations? I think the merge policy or is there any
other setting I'm missing?

I'm indexing via Curl API on the same server. The Current Speed of curl is
only 50k (down from 1300k in the first batch). I think - as the curl is
transmitting the XML, the documents are getting indexing. Because then only
would speed be so low. I don't think that the whole XML is taking the
memory - I remember I had to change the curl options to get rid of the
transmission error for large files.

This is my curl request:

curl 'http://localhost:$port/solr/product/update?commit=true'  -T
batch1.xml -X POST -H 'Content-type:text/xml

Although, we had been doing this since ages - I think I should now consider
using the solr post service (since the indexing files stays on the same
server) or using Solarium (we use PHP to make XMLs).

On Thu, 5 Dec 2019 at 20:00, Erick Erickson  wrote:

> >  I think I should have also done optimize between batches, no?
>
> No, no, no, no. Absolutely not. Never. Never, never, never between batches.
> I don’t  recommend optimizing at _all_ unless there are demonstrable
> improvements.
>
> Please don’t take this the wrong way, the whole merge process is really
> hard to get your head around. But the very fact that you’d suggest
> optimizing between batches shows that the entire merge process is
> opaque to you. I’ve seen many people just start changing things and
> get themselves into a bad place, then try to change more things to get
> out of that hole. Rinse. Repeat.
>
> I _strongly_ recommend that you undo all your changes. Neither
> commit nor optimize from outside Solr. Set your autocommit
> settings to something like 5 minutes with openSearcher=true.
> Set all autowarm counts in your caches in solrconfig.xml to 0,
> especially filterCache and queryResultCache.
>
> Do not set soft commit at all, leave it at -1.
>
> Repeat do _not_ commit or optimize from the client! Just let your
> autocommit settings do the commits.
>
> It’s also pushing things to send 5M docs in a single XML packet.
> That all has to be held in memory and then indexed, adding to
> pressure on the heap. I usually index from SolrJ in batches
> of 1,000. See:
> https://lucidworks.com/post/indexing-with-solrj/
>
> Simply put, your slowdown should not be happening. I strongly
> believe that it’s something in your environment, most likely
> 1> your changes eventually shoot you in the foot OR
> 2> you are running in too little memory and eventually GC is killing you.
> Really, analyze your GC logs. OR
> 3> you are running on underpowered hardware which just can’t take the load
> OR
> 4> something else in your environment
>
> I’ve never heard of a Solr installation with such a massive slowdown during
> indexing that was fixed by tweaking things like the merge policy etc.
>
> Best,
> Erick
>
>
> > On Dec 5, 2019, at 12:57 AM, Paras Lehana 
> wrote:
> >
> > Hey Erick,
> >
> > This is a huge red flag to me: "(but I could only test for the first few
> >> thousand documents”.
> >
> >
> > Yup, that's probably where the culprit lies. I could only test for the
> > starting batch because I had to wait for a day to actually compare. I
> > tweaked the merge values and kept whatever gave a speed boost. My first
> > batch of 5 million docs took only 40 minutes (atomic updates included)
> and
> > the last batch of 5 million took more than 18 hours. If this is an issue
> of
> > mergePolicy, I think I should have also done optimize between batches,
> no?
> > I remember, when I indexed a single XML of 80 million after optimizing
> the
> > core already indexed with 30 XMLs of 5 million each, I could post 80
> > million in a day only.
> >
> >
> >
> >> The indexing rate you’re seeing is abysmal unless these are _huge_
> >> documents
> >
> >
> > Documents only contain the suggestion name, possible titles,
> > phonetics/spellcheck/synonym fields and numerical fields for boosting.
> They
> > are far smaller than what a Search Document would contain. Auto-Suggest
> is
> > only concerned abo

Re: xms/xmx choices

2019-12-05 Thread Paras Lehana
Hi David,

Your Xmx seems to be an overkill though without usage stats, this cannot be
factified. I think you should analyze long GC pauses given that you have so
much difference between the min and max. I prefer making the min/max same
before stressing on the values. You can start with 20G but what would you
do with the remaining memory?

PS: Your configuration is something I admire. :P

On Fri, 6 Dec 2019 at 01:56, David Hastings 
wrote:

> and if this may be of use:
> https://imgur.com/a/qXBuSxG
>
> just been more or less winging the options since solr 1.3
>
>
> On Thu, Dec 5, 2019 at 2:41 PM Shawn Heisey  wrote:
>
> > On 12/5/2019 11:58 AM, David Hastings wrote:
> > > as of now we do an xms of 8gb and xmx of 60gb, generally through the
> > > dashboard the JVM hangs around 16gb.  I know Xms and Xmx are supposed
> to
> > be
> > > the same so thats the change #1 on my end, I am just concerned of
> > dropping
> > > it from 60 as thus far over the last few years I have had no problems
> nor
> > > performance issues.  I know its said a lot of times to make it lower
> and
> > > let the OS use the ram for caching the file system/index files, so my
> > first
> > > experiment was going to be around 20gb, was wondering if this seems
> > sound,
> > > or should i go even lower?
> >
> > The Xms and Xmx settings should be the same so Java doesn't need to take
> > special action to increase the pool size when more than the minimum is
> > required.  Java tends to always increase to the maximum as it runs, so
> > there's usually little benefit to specifying a lower minimum than the
> > maximum.  With a 60GB max heap, Java is likely to grab a little more
> > than 60GB from the OS, regardless of how much heap is actually in use.
> >
> > If you can provide GC logs from Solr that cover a signficant timeframe,
> > especially heavy indexing, we can analyze those and make an estimate
> > about the values you should have for Xms and Xmx.  It will only be a
> > guess ... something might happen later that requires more heap.
> >
> > We can't make recommendations without hard data.  The information you
> > provided isn't enough to guess how much heap you'll need.  Depending on
> > how such a system is used, a few GB might be enough, or you might need a
> > lot more.
> >
> >
> >
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >
> > Thanks,
> > Shawn
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: FlattenGraphFilter Eliminates Tokens - Can't match "Can't"

2019-12-05 Thread Paras Lehana
   |
> >   | end   | 5| 5 | 5
> | 3 | 3  | 5  |
> >   | positionLength| 1| 1 | 2
> | 1 | 1  | 1  |
> >   | type  | SYNONYM  | |
>| SYNONYM   |  |  |
> >   | termFrequency | 1| 1 | 1
> | 1 | 1  | 1  |
> >   | position  | 1| 1 | 1
> | 3 | 3  | 4  |
> >   | keyword   | false| false | false
> | false | false  | false  |
> > FGF   | text  | cants| cant  | can't
> | t  |
> >   | raw_bytes | [63 61 6e 74 73] | [63 61 6e 74] | [63 61 6e
> 27 74] | [74]   |
> >   | start | 0| 0 | 0
> | 4  |
> >   | end   | 5| 5 | 5
> | 5  |
> >   | positionLength| 1| 1 | 1
> | 1  |
> >   | type  | SYNONYM  | |
>|  |
> >   | termFrequency | 1| 1 | 1
> | 1  |
> >   | position  | 1| 1 | 1
> | 3  |
> >   | keyword   | false| false | false
> | false  |
> > ICUFF | text  | cants| cant  | can't
> | t  |
> >   | raw_bytes | [63 61 6e 74 73] | [63 61 6e 74] | [63 61 6e
> 27 74] | [74]   |
> >   | start | 0| 0 | 0
> | 4  |
> >   | end   | 5| 5 | 5
> | 5  |
> >   | positionLength| 1| 1 | 1
> | 1  |
> >   | type  | SYNONYM  | |
>|  |
> >   | termFrequency | 1| 1 | 1
> | 1  |
> >   | position  | 1| 1 | 1
> | 3  |
> >   | keyword   | false| false | false
> | false  |
> >
> > Query
> >
> > ST| text  | can't|
> >   | raw_bytes | [63 61 6e 27 74] |
> >   | start | 0|
> >   | end   | 5|
> >   | positionLength| 1|
> >   | type  ||
> >   | termFrequency | 1|
> >   | position  | 1|
> > SF| text  | can't|
> >   | raw_bytes | [63 61 6e 27 74] |
> >   | start | 0|
>     >   | end   | 5|
> >   | positionLength| 1|
> >   | type  ||
> >   | termFrequency | 1|
> >   | position  | 1|
> > WDGF  | text  | can| t  |
> >   | raw_bytes | [63 61 6e] | [74]   |
> >   | start | 0  | 4  |
> >   | end   | 3  | 5  |
> >   | positionLength| 1  | 1  |
> >   | type  |  |  |
> >   | termFrequency | 1  | 1  |
> >   | position  | 1  | 2  |
> >   | keyword   | false  | false  |
> > SF| text  | can| t  |
> >   | raw_bytes | [63 61 6e] | [74]   |
> >   | start | 0  | 4  |
> >   | end   | 3  | 5  |
> >   | positionLength| 1  | 1  |
> >   | type  |  |  |
> >   | termFrequency | 1  | 1  |
> >   | position  | 1  | 2  |
> >   | keyword   | false  | false  |
> > ICUFF | text  | can| t  |
> >   | raw_bytes | [63 61 6e] | [74]   |
> >   | start | 0  | 4  |
> >   | end   | 3  | 5  |
> >   | positionLength| 1  | 1  |
> >   | type  |  |  |
> >   | termFrequency | 1  | 1  |
> >   | position  | 1  | 2  |
> >   | keyword   | false  | false  |
> >
>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: From solr to solr cloud

2019-12-05 Thread Paras Lehana
Do you mean 500 cores? Tell us about the data more. How many documents per
core do you have or what performance issues are you facing?

On Fri, 6 Dec 2019 at 01:01, David Hastings 
wrote:

> are you noticing performance decreases in stand alone solr as of now?
>
> On Thu, Dec 5, 2019 at 2:29 PM Vignan Malyala 
> wrote:
>
> > Hi
> > I currently have 500 collections in my stand alone solr. Bcoz of day by
> day
> > increase in Data, I want to convert it into solr cloud.
> > Can you suggest me how to do it successfully.
> > How many shards should be there?
> > How many nodes should be there?
> > Are so called nodes different machines i should take?
> > How many zoo keeper nodes should be there?
> > Are so called zoo keeper nodes different machines i should take?
> > Total how many machines i have to take to implement scalable solr cloud?
> >
> > Plz detail these questions. Any of documents on web aren't clear for
> > production environments.
> > Thanks in advance.
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Solr indexing performance

2019-12-05 Thread Paras Lehana
Can ulimit
<https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#ulimit-settings-nix-operating-systems>
settings impact this? Review once.

On Thu, 5 Dec 2019 at 23:31, Shawn Heisey  wrote:

> On 12/5/2019 10:28 AM, Rahul Goswami wrote:
> > We have a Solr 7.2.1 Solr Cloud setup where the client is indexing in 5
> > parallel threads with 5000 docs per batch. This is a test setup and all
> > documents are indexed on the same node. We are seeing connection timeout
> > issues thereafter some time into indexing. I am yet to analyze GC pauses
> > and other possibilities, but as a guideline just wanted to know what
> > indexing rate might be "too high" for Solr so as to consider throttling ?
> > The documents are mostly metadata with about 25 odd fields, so not very
> > heavy.
> > Would be nice to know a baseline performance expectation for better
> > application design considerations.
>
> It's not really possible to give you a number here.  It depends on a lot
> of things, and every install is going to be different.
>
> On a setup that I once dealt with, where there was only a single thread
> doing the indexing, indexing on each core happened at about 1000 docs
> per second.  I've heard people mention rates beyond 5 docs per
> second.  I've also heard people talk about rates of indexing far lower
> than what I was seeing.
>
> When you say "connection timeout" issues ... that could mean a couple of
> different things.  It could mean that the connection never gets
> established because it times out while trying, or it could mean that the
> connection gets established, and then times out after that.  Which are
> you seeing?  Usually dealing with that involves changing timeout
> settings on the client application.  Figuring out what's causing the
> delays that lead to the timeouts might be harder.  GC pauses are a
> primary candidate.
>
> There are typically two bottlenecks possible when indexing.  One is that
> the source system cannot supply the documents fast enough.  The other is
> that the Solr server is sitting mostly idle while the indexing program
> waits for an opportunity to send more documents.  The first is not
> something we can help you with.  The second is dealt with by making the
> indexing application multi-threaded or multi-process, or adding more
> threads/processes.
>
> Thanks,
> Shawn
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: [ANNOUNCE] Apache Solr 8.3.1 released

2019-12-05 Thread Paras Lehana
Yup, now reflected. :)

On Thu, 5 Dec, 2019, 19:43 Erick Erickson,  wrote:

> It’s there for me when I click on your link.
>
> > On Dec 5, 2019, at 1:08 AM, Paras Lehana 
> wrote:
> >
> > Hey Ishan,
> >
> > Cannot find 8.3.1 here: https://lucene.apache.org/solr/downloads.html
> (8.3.0
> > is listed here).
> >
> > Anyways, I'm downloading it from here:
> > https://archive.apache.org/dist/lucene/solr/8.3.1/
> >
> >
> >
> > On Wed, 4 Dec 2019 at 20:27, Rahul Goswami 
> wrote:
> >
> >> Thanks Ishan. I was just going through the list of fixes in 8.3.1
> >> (published in changes.txt) and couldn't see the below JIRA.
> >>
> >> SOLR-13971 <http://issues.apache.org/jira/browse/SOLR-13971>: Velocity
> >> response writer's resource loading now possible only through startup
> >> parameters.
> >>
> >> Is it linked appropriately? Or is it some access rights issue for
> non-PMC
> >> members like me ?
> >>
> >> Thanks,
> >> Rahul
> >>
> >>
> >> On Wed, Dec 4, 2019 at 7:12 AM Noble Paul  wrote:
> >>
> >>> Thanks ishan
> >>>
> >>> On Wed, Dec 4, 2019, 3:32 PM Ishan Chattopadhyaya <
> >>> ichattopadhy...@gmail.com>
> >>> wrote:
> >>>
> >>>> ## 3 December 2019, Apache Solr™ 8.3.1 available
> >>>>
> >>>> The Lucene PMC is pleased to announce the release of Apache Solr
> 8.3.1.
> >>>>
> >>>> Solr is the popular, blazing fast, open source NoSQL search platform
> >>>> from the Apache Lucene project. Its major features include powerful
> >>>> full-text search, hit highlighting, faceted search, dynamic
> >>>> clustering, database integration, rich document handling, and
> >>>> geospatial search. Solr is highly scalable, providing fault tolerant
> >>>> distributed search and indexing, and powers the search and navigation
> >>>> features of many of the world's largest internet sites.
> >>>>
> >>>> Solr 8.3.1 is available for immediate download at:
> >>>>
> >>>>  <https://lucene.apache.org/solr/downloads.html>
> >>>>
> >>>> ### Solr 8.3.1 Release Highlights:
> >>>>
> >>>>  * JavaBinCodec has concurrent modification of CharArr resulting in
> >>>> corrupt internode updates
> >>>>  * findRequestType in AuditEvent is more robust
> >>>>  * CoreContainer.auditloggerPlugin is volatile now
> >>>>  * Velocity response writer's resource loading now possible only
> >>>> through startup parameters
> >>>>
> >>>>
> >>>> Please read CHANGES.txt for a full list of changes:
> >>>>
> >>>>  <https://lucene.apache.org/solr/8_3_1/changes/Changes.html>
> >>>>
> >>>> Solr 8.3.1 also includes  and bugfixes in the corresponding Apache
> >>>> Lucene release:
> >>>>
> >>>>  <https://lucene.apache.org/core/8_3_1/changes/Changes.html>
> >>>>
> >>>> Note: The Apache Software Foundation uses an extensive mirroring
> >> network
> >>>> for
> >>>> distributing releases. It is possible that the mirror you are using
> may
> >>>> not have
> >>>> replicated the release yet. If that is the case, please try another
> >>> mirror.
> >>>> This also applies to Maven access.
> >>>>
> >>>> -
> >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >>>> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>>>
> >>>>
> >>>
> >>
> >
> >
> > --
> > --
> > Regards,
> >
> > *Paras Lehana* [65871]
> > Development Engineer, Auto-Suggest,
> > IndiaMART Intermesh Ltd.
> >
> > 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> > Noida, UP, IN - 201303
> >
> > Mob.: +91-9560911996
> > Work: 01203916600 | Extn:  *8173*
> >
> > --
> > *
> > *
> >
> > <https://www.facebook.com/IndiaMART/videos/578196442936091/>
>
>

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: [ANNOUNCE] Apache Solr 8.3.1 released

2019-12-04 Thread Paras Lehana
Hey Ishan,

Cannot find 8.3.1 here: https://lucene.apache.org/solr/downloads.html (8.3.0
is listed here).

Anyways, I'm downloading it from here:
https://archive.apache.org/dist/lucene/solr/8.3.1/



On Wed, 4 Dec 2019 at 20:27, Rahul Goswami  wrote:

> Thanks Ishan. I was just going through the list of fixes in 8.3.1
> (published in changes.txt) and couldn't see the below JIRA.
>
> SOLR-13971 <http://issues.apache.org/jira/browse/SOLR-13971>: Velocity
> response writer's resource loading now possible only through startup
> parameters.
>
> Is it linked appropriately? Or is it some access rights issue for non-PMC
> members like me ?
>
> Thanks,
> Rahul
>
>
> On Wed, Dec 4, 2019 at 7:12 AM Noble Paul  wrote:
>
> > Thanks ishan
> >
> > On Wed, Dec 4, 2019, 3:32 PM Ishan Chattopadhyaya <
> > ichattopadhy...@gmail.com>
> > wrote:
> >
> > > ## 3 December 2019, Apache Solr™ 8.3.1 available
> > >
> > > The Lucene PMC is pleased to announce the release of Apache Solr 8.3.1.
> > >
> > > Solr is the popular, blazing fast, open source NoSQL search platform
> > > from the Apache Lucene project. Its major features include powerful
> > > full-text search, hit highlighting, faceted search, dynamic
> > > clustering, database integration, rich document handling, and
> > > geospatial search. Solr is highly scalable, providing fault tolerant
> > > distributed search and indexing, and powers the search and navigation
> > > features of many of the world's largest internet sites.
> > >
> > > Solr 8.3.1 is available for immediate download at:
> > >
> > >   <https://lucene.apache.org/solr/downloads.html>
> > >
> > > ### Solr 8.3.1 Release Highlights:
> > >
> > >   * JavaBinCodec has concurrent modification of CharArr resulting in
> > > corrupt internode updates
> > >   * findRequestType in AuditEvent is more robust
> > >   * CoreContainer.auditloggerPlugin is volatile now
> > >   * Velocity response writer's resource loading now possible only
> > > through startup parameters
> > >
> > >
> > > Please read CHANGES.txt for a full list of changes:
> > >
> > >   <https://lucene.apache.org/solr/8_3_1/changes/Changes.html>
> > >
> > > Solr 8.3.1 also includes  and bugfixes in the corresponding Apache
> > > Lucene release:
> > >
> > >   <https://lucene.apache.org/core/8_3_1/changes/Changes.html>
> > >
> > > Note: The Apache Software Foundation uses an extensive mirroring
> network
> > > for
> > > distributing releases. It is possible that the mirror you are using may
> > > not have
> > > replicated the release yet. If that is the case, please try another
> > mirror.
> > > This also applies to Maven access.
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: dev-h...@lucene.apache.org
> > >
> > >
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: [Q] Faster Atomic Updates - use docValues?

2019-12-04 Thread Paras Lehana
Hey Erick,

This is a huge red flag to me: "(but I could only test for the first few
> thousand documents”.


Yup, that's probably where the culprit lies. I could only test for the
starting batch because I had to wait for a day to actually compare. I
tweaked the merge values and kept whatever gave a speed boost. My first
batch of 5 million docs took only 40 minutes (atomic updates included) and
the last batch of 5 million took more than 18 hours. If this is an issue of
mergePolicy, I think I should have also done optimize between batches, no?
I remember, when I indexed a single XML of 80 million after optimizing the
core already indexed with 30 XMLs of 5 million each, I could post 80
million in a day only.



> The indexing rate you’re seeing is abysmal unless these are _huge_
> documents


Documents only contain the suggestion name, possible titles,
phonetics/spellcheck/synonym fields and numerical fields for boosting. They
are far smaller than what a Search Document would contain. Auto-Suggest is
only concerned about suggestions so you can guess how simple the documents
would be.


Some data is held on the heap and some in the OS RAM due to MMapDirectory


I'm using StandardDirectory (which will make Solr choose the right
implementation). Also, planning to read more about these (looking forward
to use MMap). Thanks for the article!


You're right. I should change one thing at a time. Let me experiment and
then I will summarize here what I tried. Thank you for your responses. :)

On Wed, 4 Dec 2019 at 20:31, Erick Erickson  wrote:

> This is a huge red flag to me: "(but I could only test for the first few
> thousand documents”
>
> You’re probably right that that would speed things up, but pretty soon
> when you’re indexing
> your entire corpus there are lots of other considerations.
>
> The indexing rate you’re seeing is abysmal unless these are _huge_
> documents, but you
> indicate that at the start you’re getting 1,400 docs/second so I don’t
> think the complexity
> of the docs is the issue here.
>
> Do note that when we’re throwing RAM figures out, we need to draw a sharp
> distinction
> between Java heap and total RAM. Some data is held on the heap and some in
> the OS
> RAM due to MMapDirectory, see Uwe’s excellent article:
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Uwe recommends about 25% of your available physical RAM be allocated to
> Java as
> a starting point. Your particular Solr installation may need a larger
> percent, IDK.
>
> But basically I’d go back to all default settings and change one thing at
> a time.
> First, I’d look at GC performance. Is it taking all your CPU? In which
> case you probably need to
> increase your heap. I pick this first because it’s very common that this
> is a root cause.
>
> Next, I’d put a profiler on it to see exactly where I’m spending time.
> Otherwise you wind
> up making random changes and hoping one of them works.
>
> Best,
> Erick
>
> > On Dec 4, 2019, at 3:21 AM, Paras Lehana 
> wrote:
> >
> > (but I could only test for the first few
> > thousand documents
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: [Q] Faster Atomic Updates - use docValues?

2019-12-04 Thread Paras Lehana
gt; 1> go back to the defaults for TieredMergePolicy and RamBufferSizeMB
> 2> measure first, tweak later. Analyze your GC logs to see whether
>  you’re taking an inordinate amount of time doing GC coincident with
>  your slowness. If so, adjust your heap.
> 3> If it’s not GC, put a profiler on it and find out where, exactly, you’re
>  spending your time.
>
> Best,
> Erick
>
>
> > We occasionally reindex whole data to our Auto-Suggest corpus. Total
> > documents to be indexed are around 250 million while, due to atomic
> > updates, total unique documents after full indexing converges to 60
> > million.
> >
> > We have to atomically index documents to store different names for the
> same
> > product (like "bag" and "bags"), to increase demand and to store the
> months
> > they were searched for in the past. One approach could be to calculate
> all
> > this beforehand and then index normally to Solr (non-atomic).
> >
> > Once the atomic updates process over 50 million documents, the speed of
> > indexing drops down to more than 10x of initial speed.
> >
> > As what I have learnt, atomic updates fetch the matching document by
> > uniqueKey and then does the normal index using the information in the
> > fetched document. Is this actually taking time? As the number of
> documents
> > increases, Solr might be taking time to fetch the stored document.
> >
> > But shouldn't the fetch by uniqueKey take O(1) time? If this really
> impacts
> > the fetch, can we use docValues for the field id (uniqueKey)? Our field
> is
> > of type string.
> >
> >
> >
> > I'm pasting my config lines that may impact this:
> >
> >
> --
> >
> > -Xmx8g -Xms8g
> >
> >  required="true"
> > omitNorms="false" multiValued="false" />
> > id
> >
> > 2000
> >
> >  class="org.apache.solr.index.TieredMergePolicyFactory">
> > 50
> > 50
> > 150
> > 
> >
> > 
> >10
> >12
> >false
> > 
> >
> >
> ------
> >
> >
> >
> > A normal indexing that should take less than 1 day actually takes over 5
> > days with atomic updates. Any experience or suggestion will help. How do
> > expedite your indexing process specifically atomic updates? I know this
> > might have been asked so many times and I have actually read/implemented
> > all of the recommendations. My question is specific to Atomic Updates and
> > if something exclusive to Atomic Updates can make it faster.
> >
> >
> > --
> > --
> > Regards,
> >
> > *Paras Lehana* [65871]
> > Development Engineer, Auto-Suggest,
> > IndiaMART Intermesh Ltd.
> >
> > 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> > Noida, UP, IN - 201303
> >
> > Mob.: +91-9560911996
> > Work: 01203916600 | Extn:  *8173*
> >
> > --
> > *
> > *
> >
> > <https://www.facebook.com/IndiaMART/videos/578196442936091/>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: hi question about solr

2019-12-03 Thread Paras Lehana
That's not my question. It's a suggestion. I was asking if Highlighting
could fulfill your requirement?

On Tue, 3 Dec 2019 at 17:31, Bernd Fehling 
wrote:

> No, I don't use any highlighting.
>
> Am 03.12.19 um 12:28 schrieb Paras Lehana:
> > Hi Bernd,
> >
> > Have you gone through Highlighting
> > <https://lucene.apache.org/solr/guide/8_3/highlighting.html>?
> >
> > On Mon, 2 Dec 2019 at 17:00, eli chen  wrote:
> >
> >> yes
> >>
> >> On Mon, 2 Dec 2019 at 13:29, Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de
> >>>
> >> wrote:
> >>
> >>> In short,
> >>>
> >>> you are trying to use an indexer as a full-text search engine, right?
> >>>
> >>> Regards
> >>> Bernd
> >>>
> >>> Am 02.12.19 um 12:24 schrieb eli chen:
> >>>> hi im kind of new to solr so please be patient
> >>>>
> >>>> i'll try to explain what do i need and what im trying to do.
> >>>>
> >>>> we a have a lot of books content and we want to index them and allow
> >>> search
> >>>> in the books.
> >>>> when someone search for a term
> >>>> i need to get back the position of matchen word in the book
> >>>> for example
> >>>> if the book content is "hello my name is jeff" and someone search for
> >>> "my".
> >>>> i want to get back the position of my in the content field (which is 1
> >> in
> >>>> this case)
> >>>> i tried to do that with payloads but no success. and another problem i
> >>>> encourage is .
> >>>> lets say the content field is "hello my name is jeff what is your
> >> name".
> >>>> now if someone search for "name" i want to get back the index of all
> >>>> occurrences not just the first one
> >>>>
> >>>> is there any way to that with solr without develop new plugins
> >>>>
> >>>> thx
> >>>>
> >>>
> >>
> >
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Is it possible to have different Stop words depending on the value of a field?

2019-12-03 Thread Paras Lehana
gt; words but if they are not occurring in the other languages then you have
> not a problem.
> >> On the other hand: it you need more than stop words (eg lemmatizing,
> specialized way of tokenization etc) then you need a different field per
> language. You don’t describe your full use case, but if you have different
> fields for different language then your client application needs to handle
> this (not difficult, but you have to be aware).
> >> Not sure if you need to search a given address in all languages or if
> you use the language of the user etc.
> >>
> >>> Am 02.12.2019 um 20:13 schrieb yeikel valdes :
> >>>
> >>> Hi,
> >>>
> >>>
> >>> I have an index that stores addresses from different countries.
> >>>
> >>>
> >>> As every country has different stop words, I was wondering if it is
> possible to apply a different set of stop words depending on the value of a
> field.
> >>>
> >>>
> >>> Or do I need different indexes/do itnat the ETL step to accomplish
> this?
> >>>
> >>>
> >>
> >>
> >
> >
>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Exact match

2019-12-03 Thread Paras Lehana
Hi Omer,

If you mean exact match with same number of words (Emir's), you can also
add an identifier in the beginning and end of the some other field like
title_exact. This can be done in your indexing script or using Pattern
Replace. During query side, you can use this identifier. For example,
indexing "united states" with "exactStart united states exactEnd" and
querying with the same. Obviously, you can have scoring issues here so only
use if you want it to debug or retrieve docs.

Just adding to the all possible ways. *Anyways, I like the Keyword method.*

On Tue, 3 Dec 2019 at 03:59, Erick Erickson  wrote:

> There are two different interpretations of “exact match” going on here,
> don’t be confused!
>
> Emir’s version is “the text has to match the _entire_ input. So a field
> with “a b c d” will NOT match “a b” or “a b c” or “b c", but only “a b c d”.
>
> David’s version is “The text has to contain some sequence of words that
> exactly matches my query”, so a field with “a b c d” _would_ match “a b”,
> “a b c”, “a b c d”, “b c”, “c d”, etc.
>
> Both are entirely valid use-cases, depending on what you mean by “exact
> match"
>
> Best,
> Erick
>
> > On Dec 2, 2019, at 4:38 PM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
> >
> > Hi Omer,
> > From performance perspective, it is the best if you index title as a
> single token: KeywordTokenizer + LowerCaseFilter
> >
> > If you need to query that field in some other way, you can index it
> differently as some other field using copyField.
> >
> > HTH,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> >> On 2 Dec 2019, at 21:43, OTH  wrote:
> >>
> >> Hello,
> >>
> >> What would be the best way to get exact matches (if any) to a query?
> >>
> >> E.g.:  Let's the document text is:  "united states of america".
> >> Currently, any query containing one or more of the three words "united",
> >> "states", or "america" will match with the above document.  I would
> like a
> >> way so that the document matches only and only if the query were also
> >> "united states of america" (case-insensitive).
> >>
> >> Document field type:  TextField
> >> Index Analyzer: TokenizerChain
> >> Index Tokenizer: StandardTokenizerFactory
> >> Index Token Filters: StopFilterFactory, LowerCaseFilterFactory,
> >> SnowballPorterFilterFactory
> >> The Query Analyzer / Tokenizer / Token Filters are the same as the Index
> >> ones above.
> >>
> >> FYI I'm relatively novice at Solr / Lucene / Search.
> >>
> >> Much appreciated
> >> Omer
> >
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: hi question about solr

2019-12-03 Thread Paras Lehana
Hi Bernd,

Have you gone through Highlighting
<https://lucene.apache.org/solr/guide/8_3/highlighting.html>?

On Mon, 2 Dec 2019 at 17:00, eli chen  wrote:

> yes
>
> On Mon, 2 Dec 2019 at 13:29, Bernd Fehling  >
> wrote:
>
> > In short,
> >
> > you are trying to use an indexer as a full-text search engine, right?
> >
> > Regards
> > Bernd
> >
> > Am 02.12.19 um 12:24 schrieb eli chen:
> > > hi im kind of new to solr so please be patient
> > >
> > > i'll try to explain what do i need and what im trying to do.
> > >
> > > we a have a lot of books content and we want to index them and allow
> > search
> > > in the books.
> > > when someone search for a term
> > > i need to get back the position of matchen word in the book
> > > for example
> > > if the book content is "hello my name is jeff" and someone search for
> > "my".
> > > i want to get back the position of my in the content field (which is 1
> in
> > > this case)
> > > i tried to do that with payloads but no success. and another problem i
> > > encourage is .
> > > lets say the content field is "hello my name is jeff what is your
> name".
> > > now if someone search for "name" i want to get back the index of all
> > > occurrences not just the first one
> > >
> > > is there any way to that with solr without develop new plugins
> > >
> > > thx
> > >
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


[Q] Faster Atomic Updates - use docValues?

2019-12-03 Thread Paras Lehana
Hi Community,

We occasionally reindex whole data to our Auto-Suggest corpus. Total
documents to be indexed are around 250 million while, due to atomic
updates, total unique documents after full indexing converges to 60
million.

We have to atomically index documents to store different names for the same
product (like "bag" and "bags"), to increase demand and to store the months
they were searched for in the past. One approach could be to calculate all
this beforehand and then index normally to Solr (non-atomic).

Once the atomic updates process over 50 million documents, the speed of
indexing drops down to more than 10x of initial speed.

As what I have learnt, atomic updates fetch the matching document by
uniqueKey and then does the normal index using the information in the
fetched document. Is this actually taking time? As the number of documents
increases, Solr might be taking time to fetch the stored document.

But shouldn't the fetch by uniqueKey take O(1) time? If this really impacts
the fetch, can we use docValues for the field id (uniqueKey)? Our field is
of type string.



I'm pasting my config lines that may impact this:

--

-Xmx8g -Xms8g


id

2000


 50
 50
150
 


10
12
false


--



A normal indexing that should take less than 1 day actually takes over 5
days with atomic updates. Any experience or suggestion will help. How do
expedite your indexing process specifically atomic updates? I know this
might have been asked so many times and I have actually read/implemented
all of the recommendations. My question is specific to Atomic Updates and
if something exclusive to Atomic Updates can make it faster.


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: A Last Message to the Solr Users

2019-11-28 Thread Paras Lehana
Hey Mark,

I was actually expecting (and wanting) this after your LinkedIn post.

At this point, the best way to use Solr is as it’s always been - avoid
> SolrCloud and setup your own system in standalone mode.


That's what I have been telling people who are just getting started with
Solr and thinking that SolrCloud is actually something superior to the
standalone mode. That may depend on the use case, but for me, I always
prefer to achieve things from standalone perspective instead of investing
my time over switching to Cloud.

I handle Auto-Suggest at IndiaMART. We have over 60 million docs. Single
server of *standalone* Solr is capable of handling 800 req/sec. In fact, on
production, we get ~300 req/sec and the single Solr is still able to
provide responses within 25 ms!

Anyways, I don't think that the project was a failure. All these were the
small drops of the big Solr Ocean. We, the community and you, tried, we
tested and we are still here as the open community of one of the most
powerful search platforms. SolrCloud was also needed to be introduced at
some time. Notwithstanding, I do think that the project needs to be more
open with community commits. The community and open-sourceness of Solr is
what I used to love over those of ElasticSearch's.

Anyways, keep rocking! You have already left your footprints into the
history of this beast project. 落

On Thu, 28 Nov 2019 at 09:10, Mark Miller  wrote:

> Now one company thinks I’m after them because they were the main source of
> the jokes.
>
> Companies is not a typo.
>
> If you are using Solr to make or save tons of money or run your business
> and you employee developers, please include yourself in this list.
>
> You are taking and in my opinion Solr is going down. It’s all against your
> own interest even.
>
> I know of enough people that want to solve this now, that it’s likely only
> a matter of time before they fix the situation - you ever know though.
> Things change, people get new jobs, jobs change. It will take at least 3-6
> months to make things reasonable even with a good group banding together.
>
> But if you are extracting value from this project and have Solr developers
> - id like to think you have enough of a stake in this to think about
> changing the approach everyone has been taking. It’s not working, and the
> longer it goes on, the harder it’s getting to fix things.
>
>
> --
> - Mark
>
> http://about.me/markrmiller
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>


Re: Backup v.s. Snapshot API for Solr

2019-11-26 Thread Paras Lehana
Hi Kaya,

Sorry that I still cannot understand. Once you have created a snapshot with
CREATESNAPSHOT, you can restore the snapshot with same replication restore
command, right?

How can I use a "snapshot", which is generated by  CREATESNAPSHOT API?


You just used the name to restore the backup. Please try to explain what is
your use case and what do you want to achieve. Maybe I'm not able to
understand your query so I'll appreciate if someone else helps.

On Mon, 25 Nov 2019 at 12:21, Kayak28  wrote:

> Hello, Mr. Paras:
>
> Thank you for your response, and I apologize for confusing you.
>
> Actually, I can do restore by /replication hander.
> What I did not get the idea is, how to use the following URLs, which are
> from the "Making And Restoring Backups" section of the Solr Reference
> Guide.
>
> 1.
> http://localhost:8983/solr/admin/cores?action=CREATESNAPSHOT=techproducts=commit1
> 2.
> http://localhost:8983/solr/admin/cores?action=LISTSNAPSHOTS=techproducts=commit1
> 3.
> http://localhost:8983/solr/admin/cores?action=DELETESNAPSHOT=techproducts=commit1
>
>
> It seems like "Snapshot", made by  CREATESNAPSHOT API, hold the path to the
> index and commit name only.
>
> How can I use a "snapshot", which is generated by  CREATESNAPSHOT API?
>
>
> Sincerely,
> Kaya Ota
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: Production Issue: cannot connect to solr server suddenly

2019-11-26 Thread Paras Lehana
Hi Sai,

Please elaborate. What language is the code written in? Why is there
google.com in the query?

Max retries exceeded with url


This happens when you make too many requests on a server than allowed.
Check with server at solradmin in case you have DoS or related policy
preventing this.

On Mon, 25 Nov 2019 at 16:39, Vignan Malyala  wrote:

> I don't get this error always. At certain times, I get this error with my
> Solr suddenly.
> However, If I check my Solr url, it will be working but. When I want to
> update via code, it will not work.
> Please help me out with this.
>
> ERROR:
> *Failed to connect to server at
> 'http://127.0.0.1:8983/solr/my_core/update/?commit=true
> <
> https://www.google.com/url?q=http://solradmin:Red8891@127.0.0.1:8983/solr/tenant_311/update/?commit%3Dtrue=D=hangouts=1574765671451000=AFQjCNGE326wW7hZNwLUH2dEw8scCTyEXw
> >',
> are you sure that URL is correct? Checking it in a browser might help:
> HTTPConnectionPool(host='127.0.0.1', port=8983): Max retries exceeded with
> url: /solr/my_core/update/?commit=true (Caused by
> NewConnectionError(' 0x7efd7be78a98>: Failed to establish a new connection: [Errno 111]
> Connection refused',))*
>
>
>
>
> Regards,
> Sai Vignan
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: How to tell which core was used based on Json or XML response from Solr

2019-11-26 Thread Paras Lehana
Hey Rhys,

We are using 2 jquery versions, because this tool is running a tool that
> has an old version of jquery attached to it.


Umm, okay, if you cannot tweak the tool to use the latest jQuery, I assume
you have used noConflict to handle different versions.



 I need to know the core name, because each core has different values in
> the documents, and I want to display those values based on which core was
> queried.


You use +core+ in your url. I thought you already knew the core name you
queried from so you can use that name in the response as well. Anyways,
echoParams will also do your job. That's actually a nice idea I can also
use! Besides this, I strongly recommend you not to expose solr params or IP
- just deploy a simple layer in between (different port on same server). If
you use perl, build solr query only in the script.

On Mon, 25 Nov 2019 at 21:48, Christian Spitzlay <
christian.spitz...@biologis.com> wrote:

> I was referring to David Hastings' suggestion of shielding solr from
> direct access
> which is something I strongly agree with.
>
> If you're not going with a PHP-based server-side application
> as to not expose your solr directly to the javascript application (and
> thus to possible
> manipulation by an end user) then you obviously won't need solarium.
>
> As Paras Lehana said:
> "Keep your front-end query simple - just describe your query. All the
> other parameters
> can be added on the web server side."
>
> ... that could then be implemented in your Perl code.
>
>
> Christian
>
>
>
> > Am 25.11.2019 um 16:32 schrieb rhys J :
> >
> >> if you are taking the PHP route for the mentioned server part then I
> would
> >> suggest
> >> using a client library, not plain curl.  There is solarium, for
> instance:
> >>
> >> https://solarium.readthedocs.io/en/stable/
> >> https://github.com/solariumphp/solarium
> >>
> >> It can use curl under the hood but you can program your stuff on a
> higher
> >> level,
> >> against an API.
> >>
> >>
> > I am using jquery, so I am using the json package to send and decode the
> > json that solr sends. I hope that makes sense?
> >
> > Thanks for your tip!
> >
> > Our pages are a combo of jquery, javascript, and perl.
>
>
> --
>
> Christian Spitzlay
> Senior Software-Entwickler
>
> bio.logis Genetic Information Management GmbH
> Zentrale:
> Olof-Palme-Str. 15
> D-60439 Frankfurt am Main
>
> T: +49 69 348 739 116
> christian.spitz...@biologis.com
> biologis.com
>
> Geschäftsführung: Prof. Dr. med. Daniela Steinberger
> Firmensitz:  Altenhöferallee 3, 60438 Frankfurt am Main
> Registergericht Frankfurt am Main, HRB 97945
>
>
>
>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Support mapping of multi-word Synonym at Query time.

2019-11-26 Thread Paras Lehana
Hi Community,

I know there had been many blogs about multi-term synonyms. I have been
reading a lot about this and I'm here just to take suggestions or know what
you guys are doing. The information in the blogs (SynonymFilter) could be
old and there might be better methods now (SynonymGraphFilter).

For illustrative purposes, let my synonym file only contains
(expand=true): fountain pin, fountain pen

What I have understood by researching over this (also, including what I
want to achieve):

   1. SynonymFilter doesn't handle multi-word at index time properly but
   SynonymGraphFilter does. So, docs with "fountain pin" are indexed (also) as
   "fountain pen" and vice versa. Also, docs with "pin" doesn't get indexed
   with "pen" which is what I want.

   2. At query time, "fountain pin" will match with "fountain pen" which is
   cool. But query with only "pin" will also match "fountain pen". Here, I
   want to match "fountain pen" with query "pen" but not with "pin" obviously.

   3. One way could be to use sow=false. But if I use sow (splitOnWhitespace
   
<https://lucidworks.com/post/multi-word-synonyms-solr-adds-query-time-support/#footnote6>)
   as false, I will need to use SynonymGraphFilter at query time too, right?

   4. I prefer not to use synonym analysis at query side. I work for
   Auto-Suggest and in no case, I want to increase my response time from 25
   ms. I don't know how can query time reading of synonym file can impact the
   QTime so I'm open here for criticism. Besides this, in many blogs it's
   recommended not to use synonyms at query time.

   5. Can I achieve what I want with autoGeneratePhraseQueries
   
<https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/>?
   We already use eDismax so here is some tweaking
   
<https://medium.com/@sarkaramrit2/multi-word-synonyms-splitonwhitespace-minimummatch-406e131aca4a>
   in queries for handling the problem.

   6. There also seems to be a custom parser synonym_edismax
   <https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>.
   Any experience with that? IndiaMART Product Search team already uses it.
   I'm exploring better ways if there's any.

   7. Also, Lucidworks recommended usage of Auto Phrasing TokenFilter
   <https://dzone.com/articles/solution-multi-term-synonyms>.


In short, I'm currently using SynonymGraphFilter. If I don't complicate
things and go with (1,2), I'm planning to have another copyField to index
synonyms and then boost it with something lower than the main field so that
"fountain pen" doesn't boost much with "pin" given that other "pin" docs
are more relevant.

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: Type of auto suggest feature

2019-11-24 Thread Paras Lehana
Hey Artur,

If I have understood correctly, you want to suggest terms related to the
query. It would be helpful if you describe the use case as well. Anyways,
please go through this once:

   1. Keep different form of words as different documents so that they
   could be suggested ("closed", "close" and "closing" should be different
   docs). Use stemming (Snowball Porter Stemmer Filter
   
<https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#snowball-porter-stemmer-filter>)
   so that docs with different forms could be matched.
   2. The "interesting" terms are probably related terms in your case that
   can be addressed with Synonym factory. Again, the related terms should be
   in different documents. Add all the related words in the Synonym file
   separated with commas.
   3. Will your query only have single terms? If no and if there are
   multiple terms, how do you want to handle that? This may require few more
   analyzers and tweaking in query.
   4. If you still want to suggest terms for partial words (to suggest
   "closing" if query is "clo"), use Edge NGrams
   
<https://lucene.apache.org/solr/guide/8_3/tokenizers.html#edge-n-gram-tokenizer>.
   Use Standard Tokenizer
   
<https://lucene.apache.org/solr/guide/8_3/tokenizers.html#Tokenizers-StandardTokenizer>
   to split words. What do you want to achieve with Shingle factory?
   5. I think all of the above can be simply handled without Suggester
   component. Anyways, keep exploring different ways.

Please do tell if you have any queries.

On Sun, 24 Nov 2019 at 19:11, Rudenko, Artur 
wrote:

> Hi,
> I am quite new to solr and I am interested in implementing a sort of auto
> terms suggest (not auto complete) feature based on the user query.
> Users builds some query (on multiple fields) and I am trying to help him
> refining his query by suggesting to add more terms based on his current
> query.
> The suggestions should contain synonyms and different word forms
> (query:close , result: closed, closing) and also some other "interesting"
> (hard to define what interesting is) terms and phrases based on that search.
>
> The queries are perform on text field with about 1000 words on document
> sets of about 20-50M
>
> So far I came up with solution that uses Suggester component over the 1000
> words text field (copy field) as shown below and im trying to find how to
> add to it more "interesting" terms and phrases based on the text field
>
>
>  type="text_total_shingle_synonyms" indexed="true" stored="true"
> termVectors="true" termOffsets="true" termPositions="true" required="false"
> multiValued="true" />
>
> 
>
>  positionIncrementGap="100">
>   
> 
> 
> 
> 
>  protected="protwords.txt"/>
>  maxShingleSize="4" />
>   
>   
> 
>  synonyms="synonyms_suggest.txt" ignoreCase="true" expand="false"/> 
> 
> 
>  protected="protwords.txt"/>
> 
>
> 
> 
>
>
> Thanks,
> Artur Rudenko
>
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: How to tell which core was used based on Json or XML response from Solr

2019-11-24 Thread Paras Lehana
Hey rhys,

What David suggested is what we do for querying Solr. You can figure out
our frontend implementation of Auto-Suggest by seeing the AJAX requests
fired when you type in the search box on www.indiamart.com.

Why are you using two jQuery files? If you have a web server, you already
know that which core you queried from. Just convert the Solr JSON response
and add the key "core" and return the modified JSON response. Keep your
front-end query simple - just describe your query. All the other parameters
can be added on the web server side. Anyways, why do you want to know the
core name?

On Sat, 23 Nov 2019 at 00:23, David Hastings 
wrote:

> i personally dont like php, but it may just be the easiest way to do what
> you need assuming you have a basic web server,
> send your search query to php, and use $_GET or $_POST to read it into a
> variable:
> https://www.php.net/manual/en/reserved.variables.get.php
>
> then send that to the solr server in the same piece of php with curl
>
> https://phpenthusiast.com/blog/five-php-curl-examples
>
> and return the raw result if you want.  at the very least it hides its url,
> but with this you can block the solr port to outside ip's and only allow 80
> or whatever your webserver is using
>
>
> On Fri, Nov 22, 2019 at 1:43 PM rhys J  wrote:
>
> > On Fri, Nov 22, 2019 at 1:39 PM David Hastings <
> > hastings.recurs...@gmail.com>
> > wrote:
> >
> > > 2 things (maybe 3):
> > > 1.  dont have this code facing a client thats not you, otherwise anyone
> > > could view the source and see where the solr server is, which means
> they
> > > can destroy your index or anything they want.  put at the very least a
> > > simple api/front end in between the javascript page for the user and
> the
> > > solr server
> > >
> >
> > Is there a way I can fix this?
> >
> >
> > > 2. i dont think there is a way, you would be better off indexing an
> > > indicator of sorts into your documents
> > >
> >
> > Oh this is a good idea.
> >
> > Thanks!
> >
> > 3. the jquery in your example already has the core identified, not sure
> why
> > > the receiving javascript wouldn't be able to read that variable unless
> im
> > > missing something.
> > >
> > >
> > There's another function on_data that is being called by the url, which
> > does not have any indication of what the core was, only the response from
> > the url.
> >
> > Thanks,
> >
> > Rhys
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: Backup v.s. Snapshot API for Solr

2019-11-24 Thread Paras Lehana
Hey Kaya,

Are you not able to restore with the same restore backup command?

http://localhost:8983/solr/gettingstarted/replication?command=restore=backup_name


Replace backup_name with the snapshot name.

On Fri, 22 Nov 2019 at 11:26, Kayak28  wrote:

> Hello, Community Members:
>
> I have tested the behaviors of Backup API and Snapshot API, which are
> written in the URL below.
>
> https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.html#making-and-restoring-backups
> ()
>
> As far as I observed the behavior of Backup API, I now know the followings:
> -  Solr's back up simply means to make a copy of a full-sized index.
> - Solr's restore means to make another copy of a full-sized index from a
> backup directory and refer the copy as the index.
> - Backup / Restore APIs belong to Replication Handler.
>
> Also, I know the following for Snapshot API.
> - Solr can make a snapshot at any time (i.e. does not matter if it is after
> commit/backup/ restore..)
> - snapshot_N contains a name of the snapshot(commitName) and the
> current Index path.
> - N in snapshot_N is the identical number to segments_N.
>
> I believe, by observing the Snapshot API behavior, it is impossible to
> "backup" or "restore" Solr's Index.
> So, my questions are"
> - What is the difference between Backup API and Snapshot API?
>The above Solr's Guide says "The snapshot functionality is different
> from the backup functionality as the index files aren’t copied anywhere."
> But,
> then how Snapshot API help me to make a backup?
>
> - Or, more basically, when should I use Snapshot API?
>
> - What is the rule of thumb operation for Solr's backup system?
>
> Sincerely,
> Kaya Ota
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: Highlighting on typing in search box

2019-11-24 Thread Paras Lehana
Hi rhys,

You are actually looking for an autocomplete! I work for the Auto-Suggest
(different names for the same thing) team at Indiamart.

Although we have a long journey making our Auto-Suggest one of the fastest
on the internet, I hope this summary will help you. You can always
connect with me for any doubt.

   1. If you don't have a high number of documents, you can use wildcards
   in the query for a simpler implementation. That means, you just convert
   your query from q=app to q=app* so that it matches "apple". Wildcards have
   its own cons (constant scoring, high QTime) and since it's a multi-term
   expansion, you cannot use many analyzers over the query. It had been
   performing well (under 50 ms QTime) till we had 22 million documents (which
   is still a large number of documents).

   2. Now that we have over 60 million documents, we have shifted to Edge
   NGrams
   
<https://lucene.apache.org/solr/guide/8_3/tokenizers.html#edge-n-gram-tokenizer>
   on the index side. Using this, your highlighting should work as expected
   and as you do in normal searching.

   3. You also have Suggester
   <https://lucene.apache.org/solr/guide/8_3/suggester.html> component for
   Solr that you can use for basic suggester functionalities. Supports
   highlighting.


>From my experience, the best solution is Edge NGrams (2). We have over 60
million documents and still a 25 ms QTime and we do complex scoring and
analysis.

On Thu, 21 Nov 2019 at 22:43, rhys J  wrote:

> Thank you both! I've got an autocomplete working on a basic format right
> now, and I'm working on implementing it to be smart about which core it
> searches.
>
> On Thu, Nov 21, 2019 at 11:43 AM Jörn Franke  wrote:
>
> > It sounds like you look for a suggester.
> >
> > You can use the suggester of Solr.
> >
> > For the visualization part: Angular has a suggestion box that can ingest
> > the results from Solr.
> >
> > > Am 21.11.2019 um 16:42 schrieb rhys J :
> > >
> > > Are there any recommended APIs or code examples of using Solr and then
> > > highlighting results below the search box?
> > >
> > > I'm trying to implement a search box that will search solr as the user
> > > types, if that makes sense?
> > >
> > > Thanks,
> > >
> > > Rhys
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: About Snapshot API and Backup for Solr Index

2019-11-24 Thread Paras Lehana
Hey Kaya,

Are you not able to restore with the same restore backup command?

http://localhost:8983/solr/gettingstarted/replication?command=restore=backup_name


Replace backup_name with the snapshot name.

On Thu, 21 Nov 2019 at 16:23, Kayak28  wrote:

> I was not clear in the last email.
> I mean "For me, it is impossible to "backup" or "restore" Solr's index by
> taking a snapshot."
>
> If I make you confuse, I am sorry about that.
>
> Sincerely,
> Kaya Ota
>
> 2019年11月21日(木) 19:50 Kayak28 :
>
> > Hello, Community Members:
> >
> > I am using Solr 7.7.4
> > I have a question about a Snapshot API.
> >
> >
> https://lucene.apache.org/solr/guide/7_4/making-and-restoring-backups.html#create-snapshot-api
> >
> > I have tested basic of snapshot APIs, create snapshot, list snapshot,
> > delete snapshot.
> >
> > As far as I know, when I do:
> > - create a snapshot: create a binary file (snapshot_N where n is
> identical
> > to segment_N) that contains a path of the index.
> > - the file is created under data/snapshot_metadata directory.
> >
> > - list snapshot: return JSON, containing all snapshot data which show
> > segment generation and path to the index.
> > - delete snapshot: delete a snapshot data from snapshot_N.
> >
> > For me, it is impossible to "backup" or "restore" Solr's index.
> >
> > So, my questions are:
> >
> > - How snapshot APIs are related to "backup" or "restore"?
> > - or more basically, when should I use snapshot API?
> > - Is there any way to make a "backup" without consuming a double-size of
> > the index? (I am asking because if I use backup API, it will copy the
> > entire index)
> > - what is the cheapest way to make a backup for Solr?
> >
> > If you help me out one of these questions or give me any clue,
> > I will really appreciate.
> >
> > Sincerely,
> > Kaya Ota
> >
> >
> >
> >
> >
> >
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-17 Thread Paras Lehana
Hi Guilherme,

Have you tried reindexing the documents and compare the results? No issues
if you cannot do that - let's try something else. I was going through the
whole mail and your files. You had said:

As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then I
> don't get anything (which make sense).


Why did you think that not getting anything when you add dbId made sense?
Asking because I may be missing something here.

Also, what is the purpose of so many qf's? Going through your documents and
config files, I found that your dbId's are string of numbers and I don't
think you want to find your query terms in dbId, right?
Do you want to boost the score by the values in dbId?

Your qf of dbId^100 boosts documents containing terms in q by 100x. Since
your terms don't match with the values in dbId for any document, the score
produced by this scoring is 0. 100x or 1x of 0 is still 0.
I still need to see how this scoring gets added up in edismax parser but do
reevaluate the usage of these qfs. Same goes for other qf boosts. :)


On Fri, 15 Nov 2019 at 12:23, Guilherme Viteri  wrote:

> Hi Paras
> No worries.
> No I didn’t find anything. This is annoying now...
> Yes! They do contain dbId. Absolutely all my docs contains dbId and it is
> actually my key, if you check again the schema.xml
>
> Cheers
> Guilherme
>
> On 15 Nov 2019, at 05:37, Paras Lehana  wrote:
>
> 
> Hey Guilherme,
>
> I was a bit busy for the past few days and couldn't read your mail. So,
> did you find anything? Anyways, as I had expected, the culprit is
> definitely among the qfs. Do the documents in concern contain dbId? I
> suggest you to cross check the fields in your document with those impacting
> the result in qf.
>
> On Tue, 12 Nov 2019 at 16:14, Guilherme Viteri  wrote:
>
>> What I can't understand is:
>> I search for the exact term - "Immunoregulatory interactions between a
>> Lymphoid *and a* non-Lymphoid cell" and If i search "I search for the
>> exact term - Immunoregulatory interactions between a Lymphoid *and 
>> *non-Lymphoid
>> cell" then it works
>>
>> On 11 Nov 2019, at 12:24, Guilherme Viteri  wrote:
>>
>> Thanks
>>
>> Removing stopwords is another story. I'm curious to find the reason
>> assuming that you keep on using stopwords. In some cases, stopwords are
>> really necessary.
>>
>> Yes. It always make sense the way we've been using.
>>
>> If q.alt is giving you responses, it's confirmed that your stopwords
>> filter
>> is working as expected. The problem definitely lies in the configuration
>> of
>> edismax.
>>
>> I see.
>>
>> *Let me explain again:* In your solrconfig.xml, look at your /search
>>
>> Ok, using q now, removed all qf, performed the search and I got 23
>> results, and the one I really want, on the top.
>> As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then
>> I don't get anything (which make sense). However if I query name_exact, I
>> get the 23 results again, and unfortunately if I query stId^1.0
>> name_exact^10.0 I still don't get any results.
>>
>> In summary
>> - without qf - 23 results
>> - dbId - 0 results
>> - name_exact - 16 results
>> - name - 23 results
>> - dbId^1.0
>>  name_exact^10.0 - 0 results
>> - 0 results if any other, stId, dbId (key) is added on top of the
>> name(name_exact, etc).
>>
>> Definitely lost here! :-/
>>
>>
>> On 11 Nov 2019, at 07:59, Paras Lehana 
>> wrote:
>>
>> Hi
>>
>> So I don't think removing it completely is the way to go from the scenario
>>
>> we have
>>
>>
>>
>> Removing stopwords is another story. I'm curious to find the reason
>> assuming that you keep on using stopwords. In some cases, stopwords are
>> really necessary.
>>
>>
>> Quite a considerable increase
>>
>>
>> If q.alt is giving you responses, it's confirmed that your stopwords
>> filter
>> is working as expected. The problem definitely lies in the configuration
>> of
>> edismax.
>>
>>
>>
>> I am sorry but I didn't understand what do you want me to do exactly with
>> the lst (??) and qf and bf.
>>
>>
>>
>> What combinations did you try? I was referring to the field-level boosting
>> you have applied in edismax config.
>>
>> *Let me explain again:* In your solrconfig.xml, look at your /search
>> request handler. There are many qf and some bq boosts. I want you to
>> remove
>> all of these, check response again (with q now) and keep on adding them
>&

Re: Grouping and sorting Together

2019-11-14 Thread Paras Lehana
Hi Neo,

Please mention the expected result as well. Do you want to sort "item sold"
group wise or result wise? Use sort
<https://lucene.apache.org/solr/guide/8_3/common-query-parameters.html#sort-parameter>
for the former while group.sort
<https://lucene.apache.org/solr/guide/8_3/result-grouping.html#grouping-parameters>
for the later. Describe your expected result set if I'm not answering
your question.

On Thu, 14 Nov 2019 at 19:35, neotorand  wrote:

> Hi List
> I need your help to resolve a problem for which i had been struggling for
> days.
> Lets take an example of Shoes which are grouped on basis of size and Price
>
> With first group as size and price as "7 and 7000" i have 2 documents as
> below
>
> {id:1,color:blue,item sold:10}
> {id:5,price:yellow,item sold:1}
>
>
> with second group as size and price as "8 and 8000"  i have 2 documents as
> below
>
> {id:2,color:blue,item sold:3}
> {id:3,price:yellow,item sold:5}
>
> Now i want to sort the records based on item sold.
> How I should look at  the problem.should i remove grouping and sort result
> and show.I m asking this as u can see first group has item with item sold
> as
> 10,1 and second group as 3 and 5.
> What approach i should have to look at the problem
>
> Regards
> Neo
>
>
>
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-14 Thread Paras Lehana
Hey Guilherme,

I was a bit busy for the past few days and couldn't read your mail. So, did
you find anything? Anyways, as I had expected, the culprit is definitely
among the qfs. Do the documents in concern contain dbId? I suggest you to
cross check the fields in your document with those impacting the result in
qf.

On Tue, 12 Nov 2019 at 16:14, Guilherme Viteri  wrote:

> What I can't understand is:
> I search for the exact term - "Immunoregulatory interactions between a
> Lymphoid *and a* non-Lymphoid cell" and If i search "I search for the
> exact term - Immunoregulatory interactions between a Lymphoid *and 
> *non-Lymphoid
> cell" then it works
>
> On 11 Nov 2019, at 12:24, Guilherme Viteri  wrote:
>
> Thanks
>
> Removing stopwords is another story. I'm curious to find the reason
> assuming that you keep on using stopwords. In some cases, stopwords are
> really necessary.
>
> Yes. It always make sense the way we've been using.
>
> If q.alt is giving you responses, it's confirmed that your stopwords filter
> is working as expected. The problem definitely lies in the configuration of
> edismax.
>
> I see.
>
> *Let me explain again:* In your solrconfig.xml, look at your /search
>
> Ok, using q now, removed all qf, performed the search and I got 23
> results, and the one I really want, on the top.
> As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then I
> don't get anything (which make sense). However if I query name_exact, I get
> the 23 results again, and unfortunately if I query stId^1.0 name_exact^10.0
> I still don't get any results.
>
> In summary
> - without qf - 23 results
> - dbId - 0 results
> - name_exact - 16 results
> - name - 23 results
> - dbId^1.0
>  name_exact^10.0 - 0 results
> - 0 results if any other, stId, dbId (key) is added on top of the
> name(name_exact, etc).
>
> Definitely lost here! :-/
>
>
> On 11 Nov 2019, at 07:59, Paras Lehana  wrote:
>
> Hi
>
> So I don't think removing it completely is the way to go from the scenario
>
> we have
>
>
>
> Removing stopwords is another story. I'm curious to find the reason
> assuming that you keep on using stopwords. In some cases, stopwords are
> really necessary.
>
>
> Quite a considerable increase
>
>
> If q.alt is giving you responses, it's confirmed that your stopwords filter
> is working as expected. The problem definitely lies in the configuration of
> edismax.
>
>
>
> I am sorry but I didn't understand what do you want me to do exactly with
> the lst (??) and qf and bf.
>
>
>
> What combinations did you try? I was referring to the field-level boosting
> you have applied in edismax config.
>
> *Let me explain again:* In your solrconfig.xml, look at your /search
> request handler. There are many qf and some bq boosts. I want you to remove
> all of these, check response again (with q now) and keep on adding them
> again (one by one) while looking for when the numFound drastically changes.
>
> On Fri, 8 Nov 2019 at 23:47, David Hastings 
> wrote:
>
> I use 3 word shingles with stopwords for my MLT ML trainer that worked
> pretty well for such a solution, but for a full index the size became
> prohibitive
>
> On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood 
> wrote:
>
> If we had IDF for phrases, they would be super effective. The 2X weight
>
> is
>
> a hack that mostly works.
>
> Infoseek had phrase IDF and it was a killer algorithm for relevance.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> On Nov 8, 2019, at 11:08 AM, David Hastings <
>
> hastings.recurs...@gmail.com> wrote:
>
>
> the pf and qf fields are REALLY nice for this
>
> On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood <
>
> wun...@wunderwood.org>
>
> wrote:
>
> I always enable phrase searching in edismax for exactly this reason.
>
> Something like:
>
> title^16 keywords^8 text^2
>
> To deal with concepts in queries, a classifier and/or named entity
> extractor can be helpful. If you have a list of concepts (“controlled
> vocabulary”) that includes “Lamin A”, and that shows up in a query,
>
> that
>
> term can be queried against the field matching that vocabulary.
>
> This is how LinkedIn separates people, companies, and places, for
>
> example.
>
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> On Nov 8, 2019, at 10:48 AM, Erick Erickson 
>
> wrote:
>
>
> Look at the “mm” parameter, try setting it to 100%. Although that’t
>
> not
>
> entirely likely to do

Re: Solr 8.2 indexing issues

2019-11-14 Thread Paras Lehana
Hi Sujatha,

Apologies that I am not addressing your bug directly but have you tried 8.3
<https://lucene.apache.org/solr/downloads.html> that has just been released?

On Wed, 13 Nov 2019 at 02:12, Sujatha Arun  wrote:

> We recently migrated from 6.6.2 to 8.2. We are seeing issues with indexing
> where the leader and the replica document counts do not match. We get
> different results every time we do a *:* search.
>
> The only issue we see in the logs is Jira issue : Solr-13293
>
> Has anybody seen similar issues?
>
> Thanks
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: Solr 7.7.0: Log file not getting generated

2019-11-11 Thread Paras Lehana
Hey Paresh,

I have never worked with "SQL". Did you mean DIH logging? A simple google
search yields this:
https://grokbase.com/t/lucene/solr-user/12618pmah7/how-to-show-dih-query-sql-in-log-file

Turn the Solr logging level to "FINE" for the DIH packages/classes and
they will
> show up in the log.


On Mon, 11 Nov 2019 at 16:39, Paresh  wrote:

> How can we see SQL in log file?
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: Solr 7.7.0: Log file not getting generated

2019-11-11 Thread Paras Lehana
Hi Paresh,

Glad that it worked. For the sake of future views, please try to explain
how it worked and what you did. It will help users having similar issues in
future.

On Mon, 11 Nov 2019 at 14:32, Paresh  wrote:

> Thanks Paras. It solved my problem. I am now able to see the logs.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-10 Thread Paras Lehana
Hi

So I don't think removing it completely is the way to go from the scenario
> we have


Removing stopwords is another story. I'm curious to find the reason
assuming that you keep on using stopwords. In some cases, stopwords are
really necessary.


Quite a considerable increase


If q.alt is giving you responses, it's confirmed that your stopwords filter
is working as expected. The problem definitely lies in the configuration of
edismax.



> I am sorry but I didn't understand what do you want me to do exactly with
> the lst (??) and qf and bf.


What combinations did you try? I was referring to the field-level boosting
you have applied in edismax config.

*Let me explain again:* In your solrconfig.xml, look at your /search
request handler. There are many qf and some bq boosts. I want you to remove
all of these, check response again (with q now) and keep on adding them
again (one by one) while looking for when the numFound drastically changes.

On Fri, 8 Nov 2019 at 23:47, David Hastings 
wrote:

> I use 3 word shingles with stopwords for my MLT ML trainer that worked
> pretty well for such a solution, but for a full index the size became
> prohibitive
>
> On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood 
> wrote:
>
> > If we had IDF for phrases, they would be super effective. The 2X weight
> is
> > a hack that mostly works.
> >
> > Infoseek had phrase IDF and it was a killer algorithm for relevance.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Nov 8, 2019, at 11:08 AM, David Hastings <
> > hastings.recurs...@gmail.com> wrote:
> > >
> > > the pf and qf fields are REALLY nice for this
> > >
> > > On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood <
> wun...@wunderwood.org>
> > > wrote:
> > >
> > >> I always enable phrase searching in edismax for exactly this reason.
> > >>
> > >> Something like:
> > >>
> > >>   title^16 keywords^8 text^2
> > >>
> > >> To deal with concepts in queries, a classifier and/or named entity
> > >> extractor can be helpful. If you have a list of concepts (“controlled
> > >> vocabulary”) that includes “Lamin A”, and that shows up in a query,
> that
> > >> term can be queried against the field matching that vocabulary.
> > >>
> > >> This is how LinkedIn separates people, companies, and places, for
> > example.
> > >>
> > >> wunder
> > >> Walter Underwood
> > >> wun...@wunderwood.org
> > >> http://observer.wunderwood.org/  (my blog)
> > >>
> > >>> On Nov 8, 2019, at 10:48 AM, Erick Erickson  >
> > >> wrote:
> > >>>
> > >>> Look at the “mm” parameter, try setting it to 100%. Although that’t
> not
> > >> entirely likely to do what you want either since virtually every doc
> > will
> > >> have “a” in it. But at least you’d get docs that have both terms.
> > >>>
> > >>> you may also be able to search for things like “Lamin A” _only as a
> > >> phrase_ and have some luck. But this is a gnarly problem in general.
> > Some
> > >> people have been able to substitute synonyms and/or shingles to make
> > this
> > >> work at the expense of a larger index.
> > >>>
> > >>> This is a generic problem with context. “Lamin A” is really a
> > “concept”,
> > >> not just two words that happen to be near each other. Searching as a
> > phrase
> > >> is an OOB-but-naive way to try to make it more likely that the ranked
> > >> results refer to the _concept_ of “Lamin A”. The assumption here is
> “if
> > >> these two words appear next to each other, they’re more likely to be
> > what I
> > >> want”. I say “naive” because “Lamins: A new approach to...” would
> > _also_ be
> > >> found for a naive phrase search. (I have no idea whether such a title
> > makes
> > >> sense or not, but you figured that out already)...
> > >>>
> > >>> To do this well you’d have to dive in to NLP/Machine learning.
> > >>>
> > >>> I truly wish we could have the DWIM search algorithm (Do What I
> Mean)….
> > >>>
> > >>>> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri 
> > >> wrote:
> > >>>>
> > >>>> HI Walter and Paras
> > >>>>
> > >>>> I indexed it removing all the r

Re: Solr as a windows service

2019-11-10 Thread Paras Lehana
Hey Thushara,

Have you followed Installing Solr
<https://lucene.apache.org/solr/guide/8_3/installing-solr.html> for
Windows already?

On Sat, 9 Nov 2019 at 10:42,  wrote:

> Hi,
>
> I am using solr 8.3 and I need it to use as a running background windows
> service. After the 5x release it’s not supporting external container as
> tomcat so what is the steps to use it as a background windows service
> running automatically. I used nssm to achieve this but other than that any
> thing possible or can deploy it still with tomcat?
> Regards,
>
> Thushara



-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: Solr 7.7.0: Log file not getting generated

2019-11-10 Thread Paras Lehana
Hey Paresh,

Please see permanent logging in Configuring Logging
<https://lucene.apache.org/solr/guide/7_7/configuring-logging.html#permanent-logging-settings>.
Check what is your log directory.

On Mon, 11 Nov 2019 at 12:27, Paresh  wrote:

> Hi,
>
> We just moved to Solr 7.7.0 and log file is not getting generated. Even log
> folder is not present there.
>
> I manually enabled all the logs through Solr-admin UI but it is not
> working.
>
> Is there any other way to enable the logs and how can we enable it at the
> start of Solr?
>
> Any help is appreciated.
>
> Thanks,
> Paresh
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: Solr missing mandatory uniqueKey field: id or Unknown field

2019-11-10 Thread Paras Lehana
Hi Sthitaprajna,

In Admin UI, select core and go to Schema. Select "title" and post the
screenshot (try to host it). Do the same for "id".

On Mon, 11 Nov 2019 at 09:14, Alexandre Rafalovitch 
wrote:

> You still have a mismatch between what you think the schema is
> (uniqueKey=title) and message of uniqueKey being id. Focus on that. Try to
> get schema FROM Solr instead og looking at one you are providing. Or look
> in Admin UI what it shows for field title and for field id.
>
> Regards,
> Alex
>
> On Mon, Nov 11, 2019, 2:30 PM Sthitaprajna,  >
> wrote:
>
> >
> >
> https://stackoverflow.com/questions/58763657/solr-missing-mandatory-uniquekey-field-id-or-unknown-field?noredirect=1#comment103816164_58763657
> >
> > May be this will help ? I added screenshots.
> >
> > On Fri, 8 Nov 2019, 22:57 Alexandre Rafalovitch, 
> > wrote:
> >
> > > Something does not make sense, because your schema defines "title" as
> > > the uniqueKey field, but your message talks about "id". Are you
> > > absolutely sure that the Solr/collection you get an error for is the
> > > same Solr where you are checking the schema?
> > >
> > > Also, do you have a bit more of the error and stack trace. I find
> > > "...or Unknown field" to be very puzzling. What are you trying to do
> > > when you get this error?
> > >
> > > Regards,
> > >   Alex.
> > >
> > > On Sat, 9 Nov 2019 at 01:05, Sthitaprajna <
> iamonlyforu.frie...@gmail.com
> > >
> > > wrote:
> > > >
> > > > Thanks,
> > > >
> > > > I did reload after solr configuration upload to zk
> > > > Yes i push the config set to zk and i can see all my changes are on
> > cloud
> > > > I turned off the managed schema
> > > > Yes it has, ypu could have seen it if the attachment are available. I
> > > have attached again may be it will be available.
> > > >
> > > > On Fri, 8 Nov 2019, 21:13 Erick Erickson, 
> > > wrote:
> > > >>
> > > >> Attachments are aggressively stripped by the mail server, so I can’t
> > > see them.
> > > >>
> > > >> Possibilities
> > > >> - you didn’t reload your core/collection
> > > >> - you didn’t push the configset to Zookeeper if using SolrCloud
> > > >> - you are using the managed schema, which uses a file called
> > > “managed-schema” rather than classic, which uses schema.xml
> > > >> - your input doesn’t really have a field “title”.
> > > >> - the doc just doesn’t have a field called “title” in it when it’s
> > sent
> > > to Solr.
> > > >>
> > > >>
> > > >> Best,
> > > >> Erick
> > > >>
> > > >> > On Nov 8, 2019, at 4:41 AM, Sthitaprajna <
> > > iamonlyforu.frie...@gmail.com> wrote:
> > > >> >
> > > >> > title
> > > >>
> > >
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


  1   2   >