Re: Influencing representing document in grouped search.

2017-10-17 Thread Modassar Ather
Thanks Alessandro for your suggestion. I tried few queries around results collapsing but I am not able to get the required result. What I want is just to get representing document from any grouped result based on certain field. E.g. There are results which belong to genre A/B/C/D and each grouped

Re: Solr query not returning all results

2017-10-17 Thread Shawn Heisey
On 10/17/2017 5:53 PM, Phillip Wu wrote: > I've indexed a lot of documents (*.docx & *.vsd). > > When I run a query from the website it returns only a small proportion of the > data in the index: > { > "responseHeader":{ > "status":0, > "QTime":66, > "params":{ >"q":"NS Finance 9.2", >

Re: Quick quester about suggester component

2017-10-17 Thread James Keeney
Yep. Understood. On Tue, Oct 17, 2017, 8:14 PM Erick Erickson wrote: > Well, you tell the suggester what field to use in the first place in > the configuration. > > But I don't quite understand. Suggester is not _intended_ to return > documents. It returns, well,

Re: SOLR cores are getting locked

2017-10-17 Thread Gunalan V
Thanks Erik! I have created separate solr home directory for each solrclound node and looks like now it's working fine. GVK On Mon, Oct 16, 2017 at 9:05 AM, Erick Erickson wrote: > bin/solr start -help > > will give you a lot of info. But yes, the -s option is what

Re: Howto verify that only docValues are returned

2017-10-17 Thread Erick Erickson
bq: Do I have an incorrect understanding of how this works? If I take "OS disk cache" to include the OS's memory available as a result of MMapDirectory, you'r spot on. I want to quibble a bit with (1) above. If you search on a docValues=true indexed=false field it's terrible unless you have a

Re: Quick quester about suggester component

2017-10-17 Thread Erick Erickson
Well, you tell the suggester what field to use in the first place in the configuration. But I don't quite understand. Suggester is not _intended_ to return documents. It returns, well, suggestions. It's up to you to do something with them, i.e. substitute them into a new query (against whatever

Re: Solr query not returning all results

2017-10-17 Thread Erick Erickson
bq: Is this expected behavior where it returns only a subset of the documents it has found? No. But there is _so_ much you're leaving out here that it's totally impossible to say much. bq: I've indexed a lot of documents (*.docx & *.vsd). how? Tika? ExtractingRequestHandler? Some custom code?

Solr query not returning all results

2017-10-17 Thread Phillip Wu
Hi, I've indexed a lot of documents (*.docx & *.vsd). When I run a query from the website it returns only a small proportion of the data in the index: { "responseHeader":{ "status":0, "QTime":66, "params":{ "q":"NS Finance 9.2", "fl":"id,date", "start":"0", "_":"1508193512223"}},

Re: E-Commerce Search: tf-idf, tie-break and boolean model

2017-10-17 Thread Walter Underwood
That page from Stanford is not about e-commerce search. Westlaw is professional librarian search. I agree with Emir’s advice. Start with edismax. Use a small value for the tie-breaker. It is one of the least important configuration values. I use the default from the sample configs: 0.1

Re: Need help with Slow Query Logging

2017-10-17 Thread Walter Underwood
I would not do this in Solr. Post process the log file to split them out. That allows you to change the definition of “slow” later, reprocess older files, etc. Do log analysis with log analysis tools. Don’t try to push that too far up the chain into the production server. wunder Walter

Re: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-17 Thread Nawab Zada Asad Iqbal
Randy That is one issue, i don't know if it fixes everything for you or not. However, Lucene doesn't put a limit on number of incoming requests and after https://issues.apache.org/jira/browse/LUCENE-6659 , solr has no way (i don't know at least) to limit threads. So if you have ton of parallel

Re: [EXTERNAL] Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-17 Thread Randy Fradin
I've been trying to understand DocumentsWriterFlushControl.java to figure this one out. I don't really have a firm grasp of it but I'm starting to suspect that blocked flushes in aggregate can take up to (ramBufferSizeMB * maximum # of concurrent update requests * # of cores) of heap space and

Quick quester about suggester component

2017-10-17 Thread James Keeney
I've setup the suggester and want to act on the full document when user selects one of the suggestions. Ideally it would be nice to be able to tell the suggester to return more than just the field that the suggestion index is built from. If that can't be done, then should I do the following:

IllegalAccessException 7+

2017-10-17 Thread Brandon Lee
Hi, 7.0+ solr-analytics code added: public void prepare(ResponseBuilder rb) throws IOException { ... rb._isOlapAnalytics = false; ... references to ResponseBuilder._isOlapAnalytics -- a package var in solr-core Since solr-analytics lives in 'dist/' while solr-core lives in

Re: [ANNOUNCE] Apache Solr 7.1.0 released

2017-10-17 Thread Noble Paul
Thanks Shalin On Wed, Oct 18, 2017 at 2:06 AM, Susheel Kumar wrote: > Thank you, Yonik. Able to download directly. > > On Tue, Oct 17, 2017 at 11:29 AM, Yonik Seeley wrote: > >> It pointed to 7.1.0 for me perhaps a browser cache issue? >> Anyway,

Re: Expected mime type application/octet-stream but got text/html

2017-10-17 Thread Jason Gerlowski
At a glance, I'd guess that your SolrClient object isn't setup correctly, probably because it has the wrong "baseURL" specified. Solr has a "/solr//update" URL, but the error above makes it look like your application is reaching out to "/solr/update" which isn't a valid endpoint. If your

Re: Howto verify that only docValues are returned

2017-10-17 Thread Shawn Heisey
On 10/17/2017 2:09 AM, Julian Ohrt wrote: The Solr 6.6 documentation states: In cases where the query is returning only docValues fields performance may improve since returning stored fields requires disk reads and decompression whereas returning docValues fields in the fl list only requires

Re: solr 7.0: What causes the segment to flush

2017-10-17 Thread Nawab Zada Asad Iqbal
I take my yesterday's comment back. I assumed that the file being written is a segment, however after letting solr run for the night. I see that the segment is flushed at the expected size:1945MB (so that file which i observed was still open for writing). Now, I have two other questions:- 1. Is

Re: [ANNOUNCE] Apache Solr 7.1.0 released

2017-10-17 Thread Susheel Kumar
Thank you, Yonik. Able to download directly. On Tue, Oct 17, 2017 at 11:29 AM, Yonik Seeley wrote: > It pointed to 7.1.0 for me perhaps a browser cache issue? > Anyway, you can go directly as well: > http://www.apache.org/dyn/closer.lua/lucene/solr/7.1.0 > > -Yonik > > >

Re: [ANNOUNCE] Apache Solr 7.1.0 released

2017-10-17 Thread Yonik Seeley
It pointed to 7.1.0 for me perhaps a browser cache issue? Anyway, you can go directly as well: http://www.apache.org/dyn/closer.lua/lucene/solr/7.1.0 -Yonik On Tue, Oct 17, 2017 at 11:25 AM, Susheel Kumar wrote: > Thanks, Shalin. > > But the download mirror still has

Re: [ANNOUNCE] Apache Solr 7.1.0 released

2017-10-17 Thread Susheel Kumar
Thanks, Shalin. But the download mirror still has 7.0.1 not 7.1.0. http://www.apache.org/dyn/closer.lua/lucene/solr/7.0.1 On Tue, Oct 17, 2017 at 5:28 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > 17 October 2017, Apache Solr™ 7.1.0 available > > The Lucene PMC is pleased to

Re: E-Commerce Search: tf-idf, tie-break and boolean model

2017-10-17 Thread Erick Erickson
BTW, Lucene/Solr has never implemented a boolean model, see: https://lucidworks.com/2011/12/28/why-not-and-or-and-not/ So if you need pure boolean you can pretty much get it if you parenthesize. Best, Erick On Tue, Oct 17, 2017 at 2:06 AM, Junte Zhang wrote: > My

Re: hi load mitigation

2017-10-17 Thread j.s.
hi thx for writing On 10/17/17 11:07, Erick Erickson wrote: More details would help. What is 8? 2? this is from the output of the 'top' command on this host, specifically the 'load average:' field. although when i checked just now it was < 1, so that is where i want it to be. the load is

Re: solr 7.0: What causes the segment to flush

2017-10-17 Thread Erick Erickson
Segments will also be created whenever, say, time-based commits trip, i.e. if you have an autoCommit setting to N seconds. And if the client (SolrJ) submits queries with commitWithin (or, even worse sends commits). BTW, I'd set my rambuffer size to 5g as that's the default max size in

Re: hi load mitigation

2017-10-17 Thread Erick Erickson
More details would help. What is 8? 2? Load is query or index load? How many users? Increase in indexing rate? Have you looked at: https://wiki.apache.org/solr/SolrPerformanceProblems and the links on that page? Best, Erick On Tue, Oct 17, 2017 at 6:10 AM, j.s. wrote: >

Re: Howto verify that only docValues are returned

2017-10-17 Thread Erick Erickson
See: SOLR-8344 and the JIRAs linked for a pretty extensive discussion. Note, you can force some of this in some other versions by specifying useDocValuesAsStored (since 5.5). How you'd verify I'm not quite sure. That kind of information isn't put in the logs. On Tue, Oct 17, 2017 at 1:09 AM,

Howto verify that only docValues are returned

2017-10-17 Thread Julian Ohrt
The Solr 6.6 documentation states: In cases where the query is returning only docValues fields performance may improve since returning stored fields requires disk reads and decompression whereas returning docValues fields in the fl list only requires memory access. I want to use this

Re: spell-check does not return collations when using search query with filter

2017-10-17 Thread Arnold Bronley
I tried spellcheck.q=polt and q=tag:polt. I get collations, but they are only for polt and not tag:polt. Because of that, the hits that I get back are for frequency of plot and not frequency of tag:plot { "responseHeader": { "status": 0, "QTime": 20, "params": {

Re: Parallel SQL: GROUP BY throws exception

2017-10-17 Thread Kevin Risden
Calcite might support this in 0.14. I know group by support was improved lately. It might be as simple as upgrading the dependency? A test case showing the NPE would be helpful. We are using MySQL dialect under the hood with Calcite. Kevin Risden On Tue, Oct 17, 2017 at 8:09 AM, Joel Bernstein

Re: Concern on solr commit

2017-10-17 Thread Yonik Seeley
Related: maxWarmingSearchers behavior was fixed (block for another commit to succeed first rather than fail) in Solr 6.4 and later. https://issues.apache.org/jira/browse/SOLR-9712 Also, if any of your "realtime" search requests only involve retrieving certain documents by ID, then you can use

Re: Concern on solr commit

2017-10-17 Thread Leo Prince
Hi, Thank you Emir, Erick and Shawn for your inputs. I am currently using SolrCloud and planning to try out commitWithin parameter to reduce hard commits as per your advise. Though, just wanted to double check whether commitWithin have any negative impacts in SolrCloud environment like lag to

Re: Accent insensitive search for greek characters

2017-10-17 Thread Alexandre Rafalovitch
There is also ICUTransform which is insanely powerful and can be configured. I did something for Thai test at https://github.com/arafalov/solr-thai-test/blob/master/collection1/conf/schema.xml Regards, Alex On Oct 13, 2017 3:28 AM, "Chitra" wrote: > Hi, > >I

hi load mitigation

2017-10-17 Thread j.s.
hi i run a stand alone solr instance in which usage has suddenly spiked a bit. the load was at 8, but by adding another CPU i brought it down to 2. much better but not where i'd like it to be. i guess i'm writing to see if anyone has any suggestions about where to look to improve this. the

Re: Parallel SQL: GROUP BY throws exception

2017-10-17 Thread Joel Bernstein
This would be a good jira to create at ( https://issues.apache.org/jira/projects/SOLR) Interesting that the query works in MySQL. I'm assuming MySQL automatically adds the group by field to the field list. We can look at doing this as well. Joel Bernstein http://joelsolr.blogspot.com/ On Tue,

Re: Using pint field as uniqueKey

2017-10-17 Thread Amrit Sarkar
https://issues.apache.org/jira/browse/SOLR-10829: IndexSchema should enforce that uniqueKey field must not be points based The description tells the real reason. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn:

Re: Influencing representing document in grouped search.

2017-10-17 Thread alessandro.benedetti
Can results collapsing[1] be of use for you ? if it is the case, you can use that feature and explore its flexibility in selecting the group head : 1) min | max for a numeric field 2) min | max for a function query 3) sort [1]

Re: Using pint field as uniqueKey

2017-10-17 Thread alessandro.benedetti
In addition to what Amrit correctly stated, if you need to search on your id, especially range queries, I recommend to use a copy field and leave the id field, almost as default. Cheers - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. -

AW: Howto verify that update is "in-place"

2017-10-17 Thread James
I found a solution which works for me: Add a document with very little tokenized text and write down QTime (for me: 5ms) Add another document with very much text (I used about 1MB of Lorem Ipsum sample text) and write down QTime (for me: 70ms). Perform an update operation on document 2 which

Influencing representing document in grouped search.

2017-10-17 Thread Modassar Ather
Hi, Can grouped search result influenced in such a way that the representing document of a particular group is chosen based on some other field? Normally it is the score which defines the document as a group representing document or the group.sort parameter value plays the role. E.g. For a

AW: Howto verify that update is "in-place"

2017-10-17 Thread Julian Ohrt
Hi Emir nad Amrit, @Emir: Nice idea but after changing any document in any way and after committing the changes, all Doc counter (Num, Max, Deleted) are still the same, only thing that changes is the Version (increases by steps of 2) . @Amrit: Are you saying that the _version_ field should not

Re: Parallel SQL: GROUP BY throws exception

2017-10-17 Thread Dmitry Gerasimov
Joel, Thanks for the tip. That worked. I was confused since this query works just fine in MySQL. It would of course be very helpful if SOLR was responding with a proper error. What’s the process here? Where do I post this request? Dmitry > -- Forwarded message -- > From: Joel

Re: solr 7.0: What causes the segment to flush

2017-10-17 Thread Amrit Sarkar
> > In 7.0, i am finding that the file is written to disk very early on > and it is being updated every second or so. Had something changed in 7.0 > which is causing it? I tried something similar with solr 6.5 and i was > able to get almost a GB size files on disk. Interesting observation,

Re: Howto verify that update is "in-place"

2017-10-17 Thread Amrit Sarkar
James, @Amrit: Are you saying that the _version_ field should not change when > performing an atomic update operation? It should change. a new version will be allotted to the document. I am not that sure about in-place updates, probably a test run will verify that. Amrit Sarkar Search Engineer

AW: Howto verify that update is "in-place"

2017-10-17 Thread James
Hi Emir and Amrit, thanks for your reponses! @Emir: Nice idea but after changing any document in any way and after committing the changes, all Doc counter (Num, Max, Deleted) are still the same, only thing that changes is the Version (increases by steps of 2) . @Amrit: Are you saying that the

Re: Unbalanced CPU no SolrCloud

2017-10-17 Thread Emir Arnautović
The fact that the load remains even after indexing is stopped suggests that it might be related to merges, but puzzling part is why single node. Your index size is not trivial - 500GB/8 shards = ~60GB/shard, but that should be the case on the other node as well. Maybe you could use some tool to

Re: Using pint field as uniqueKey

2017-10-17 Thread Amrit Sarkar
By looking into the code, if (uniqueKeyField.getType().isPointField()) { String msg = UNIQUE_KEY + " field ("+uniqueKeyFieldName+ ") can not be configured to use a Points based FieldType: " + uniqueKeyField.getType().getTypeName(); log.error(msg); throw new

Re: Howto verify that update is "in-place"

2017-10-17 Thread Amrit Sarkar
Hi James, As for each update you are doing via atomic operation contains the "id" / "uniqueKey". Comparing the "_version_" field value for one of them would be fine for a batch. Rest, Emir has list them out. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter

[ANNOUNCE] Apache Solr 7.1.0 released

2017-10-17 Thread Shalin Shekhar Mangar
17 October 2017, Apache Solr™ 7.1.0 available The Lucene PMC is pleased to announce the release of Apache Solr 7.1.0 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting,

Re: Howto verify that update is "in-place"

2017-10-17 Thread Emir Arnautović
Hi James, I did not try, but checking max and num doc might give you info if update was in-place or atomic - atomic is reindexing of existing doc so the old doc will be deleted. In-place update should just update doc values of existing doc so number of deleted docs should not change. HTH, Emir

Using pint field as uniqueKey

2017-10-17 Thread Michael Kondratiev
I'm trying to set up uniqueKey ( what is integer) like that: id But when I upload configuration into solr i see following error: uniqueKey field (id) can not be configured to use a Points based FieldType: pint If i set type=“string” everything seems to be ok.

RE: E-Commerce Search: tf-idf, tie-break and boolean model

2017-10-17 Thread Junte Zhang
My take on e-commerce search. Similarity matching using a vector space based model, probabilistic or Boolean ranking has not so much importance as compared to web search or other domains with full-text search. The reason is the content. Usually very short texts, highly structured, and often not

Re: spell-check does not return collations when using search query with filter

2017-10-17 Thread alessandro.benedetti
But you used : "spellcheck.q": "tag:polt", Instead of : "spellcheck.q": "polt", Regards - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Stats component with function

2017-10-17 Thread Renuka Srishti
Hello All Here is the link for the reference. I want to use sub() by passing the min and max value of the price, can we perform this type of action, with the query given in the link. Thanks Renuka

pint as uniquekey

2017-10-17 Thread Michael Kondratiev
Hello! I'm trying to set up uniqueKey ( what is integer) like that: id But when I upload configuration into solr i see following error: uniqueKey field (id) can not be configured to use a Points based FieldType: pint If i set type=“string” everything seems to be ok.

Re: E-Commerce Search: tf-idf, tie-break and boolean model

2017-10-17 Thread Charlie Hull
For our e-commerce customers we've been recommending a test-based relevance tuning strategy: here's a series of blogs written for us by someone who ran search for the world's largest electronic component distributor:

Howto verify that update is "in-place"

2017-10-17 Thread James
I am using Solr 6.6 and carefully read the documentation about atomic and in-place updates. I am pretty sure that everything is set up as it should. But how can I make certain that a simple update command actually performs an in-place update without internally re-indexing all other fields?

solr 7.0: What causes the segment to flush

2017-10-17 Thread Nawab Zada Asad Iqbal
Hi, I have tuned (or tried to tune) my settings to only flush the segment when it has reached its maximum size. At the moment,I am using my application with only a couple of threads (i have limited to one thread for analyzing this scenario) and my ramBufferSizeMB=2 (i.e. ~20GB). With this,

Expected mime type application/octet-stream but got text/html

2017-10-17 Thread Shoaib
I have been following tutorial from below link to implement Spring data Solr http://www.baeldung.com/spring-data-solr Attached is my config file, model and repository for spring data solr. when i make any query or save my model i receive the below exception. my solr is working fine when i