Re: can we use Streaming Expressions for different collection

2016-01-05 Thread Mugeesh Husain
Thanks @Joel Bernstein, Actually i am using solrlcoud in 3 node/server and have created 3 core on server 1,2,3 resp. I need to implement join operation on these cores but join does not support on solrcloud. so i am thinking that Streaming Api could solved my problem. -- View this message

Re: how to search miilions of record in solr query

2016-01-05 Thread Mugeesh Husain
Still i am struck ,how to solve my problem, search millions of ID with minimum response time. @Upayavira Please elaborate it. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-search-miilions-of-record-in-solr-query-tp4248360p4248597.html Sent from the Solr -

Re: how to search miilions of record in solr query

2016-01-05 Thread Mugeesh Husain
@Ere Maijala >>question is: WHY do you need to search for millions of IDs? I am explaining: I have a list of ID's of 1 Millions I will search in solr suppose like below IP:8083/select?q=ID:(1,4,7,...upto 1 Millions)=10=0, then it will display 10 result , for pagination next search will

Re: solr 5.2.0 need to build high query response

2016-01-05 Thread Novin Novin
Thanks David. It is quite good to use for NRT. Apologies, I didn't mention that facet search is really slow. I found the below reason which could be the reason because I am using facet spatial search which is getting slow. To know more about solr hard and soft commits, have a look at this blog

Re: MapReduceIndexerTool Indexing

2016-01-05 Thread Erick Erickson
MRIT is not designed for that scenario, so you simply can't. What people usually do is have a process whereby, after the initial bulk load, there is some way their system-of-record "knows" what new docs have been added since and indexes only those. Flume is sometimes used if you have access.

Re: Memory Usage increases by a lot during and after optimization .

2016-01-05 Thread Zheng Lin Edwin Yeo
Hi Shawn, Thanks for your reply. I have uploaded the screenshot here https://www.dropbox.com/s/l5itfbaus1c9793/Memmory%20Usage.png?dl=0 Basically, Java(TM) Platform SE Library, which Solr is running on, is only using about 22GB currently. However, the memory usage at the top says it is using

Re: Many patterns against many sentences, storing all results

2016-01-05 Thread Will Moy
Thank you both, that's really helpful. Luwak and Percolator look like good places to dig deeper. Best wishes Will *Will Moy* Director 020 3397 5140 *Full Fact* fullfact.org Twitter • Facebook • LinkedIn

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Matteo Grolla
Thanks Erik and Binoy, This is a case I stumbled upon: with queries like q=*:*={!cache=false}n_rea:xxx={!cache=false}provincia:,fq={!cache=false}type: where n_rea filter is highly selective I was able to make > 3x performance improvement disabling cache I think it's because the

Re: SOLR replicas performance

2016-01-05 Thread Erick Erickson
What version of Solr? Prior to 5.2 the replicas were doing lots of unnecessary work/being blocked, see: https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/ Best, Erick On Tue, Jan 5, 2016 at 6:09 AM, Matteo Grolla wrote: > Hi Luca, >

Re: solr 5.2.0 need to build high query response

2016-01-05 Thread Erick Erickson
It sounds like you're not doing proper autowarming, which you'd need to do either with hard or soft commits that open new searchers. see: https://wiki.apache.org/solr/SolrCaching#Cache_Warming_and_Autowarming In particular, you should have a newSearcher event that facets on the fields you expect

Re: Memory Usage increases by a lot during and after optimization .

2016-01-05 Thread Zheng Lin Edwin Yeo
Hi Toke, I read the server's memory usage from the Task manager under Windows, Regards, Edwin On 4 January 2016 at 17:17, Toke Eskildsen wrote: > On Mon, 2016-01-04 at 10:05 +0800, Zheng Lin Edwin Yeo wrote: > > A) Before I start the optimization, the server's

Re: Many patterns against many sentences, storing all results

2016-01-05 Thread Jack Krupansky
It doesn't sound like a very good match with Solr - or any other search engine or any relational database or data store for that matter. Sure, maybe you can get something to work with extraordinary effort, but it is unlikely that you will ever be happy with the results. You should probably just

Re: solr 5.2.0 need to build high query response

2016-01-05 Thread Novin Novin
If I'm correct, you are talking about this *or may be here too.* static firstSearcher warming in solrconfig.xml Thanks, Novin On Tue, 5 Jan

Re: how to search miilions of record in solr query

2016-01-05 Thread Erick Erickson
So still use Ere's suggestion. There's no reason at all to search all million every time. If start=0, just search the first N (say 1,000). Keep doing that until you don't get docs then add more docs. Or fire off the first query then, when you know there are going to be pagination, fire off the

Re: How to use DocValues with TextField

2016-01-05 Thread Erick Erickson
Assuming (and it wasn't clear from your problem statement) that you need to search tokens in your field, this approach should be fine. I think Markus' comment was assuming that you did _not_ need to search the field. If you do, a copyField seems best. Do be aware, though, that this will make for

Re: Data migration from one collection to the other collection

2016-01-05 Thread Erick Erickson
What changes? You simply have "hot" and "cold" collections. When it comes time to index data you: 1> create a collection 2> index to it. 3> use the Collections API to point your "active" collection to this new one 4> do whatever you want with the old one. The setup is, of course, that your hot

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Erick Erickson
Matteo: Let's see if I understand your problem. Essentially you want Solr to analyze the filter queries and decide through some algorithm which ones to cache. I have a hard time thinking of any general way to do this, certainly there's not hing in Solr that does this automatically As Binoy

RE: Many patterns against many sentences, storing all results

2016-01-05 Thread Allison, Timothy B.
Might want to look into: https://github.com/flaxsearch/luwak or https://github.com/OpenSextant/SolrTextTagger -Original Message- From: Will Moy [mailto:w...@fullfact.org] Sent: Tuesday, January 05, 2016 11:02 AM To: solr-user@lucene.apache.org Subject: Many patterns against many

Re: SOLR replicas performance

2016-01-05 Thread Matteo Grolla
Hi Luca, not sure if I understood well. Your question is "Why are index times on a solr cloud collecton with 2 replicas higher than on solr cloud with 1 replica" right? Well with 2 replicas all docs have to be deparately indexed in 2 places and solr has to confirm that both indexing went

Data migration from one collection to the other collection

2016-01-05 Thread vidya
Hi I would like to maintain two cores for history data and current data where hdfs is my datasource. My requirement is that data input should be given to only one collection and previous data should be moved to history collection. 1)Creating two cores and migrating data from current to history

enable disable filter query caching based on statistics

2016-01-05 Thread Matteo Grolla
Hi, after looking at the presentation of cloudsearch from lucene revolution 2014 https://www.youtube.com/watch?v=RI1x0d-yO8A=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP=49 min 17:08 I recognized I'd love to be able to remove the burden of disabling filter query caching from developers the problem:

How to use DocValues with TextField

2016-01-05 Thread Alok Bhandari
Hello , I have a field which is defined to be a textField with PatternTokenizer which splits on ";". Now for one of the use case I need to use /export handler to export this field. As /export handler needs field to support docValues , so if I try to mark that field as docValues="true" it says

RE: How to use DocValues with TextField

2016-01-05 Thread Markus Jelsma
Hello - indeed, this is not going to work. But since you are using the token filter as some preprocessor, you could easily use an update request processor to do the preprocessing work for you. Check out the documentation, i think you can use the RegexReplaceProcessor.

RE: Unable to extract images content (OCR) from PDF files using Solr

2016-01-05 Thread Allison, Timothy B.
I concur with Erick and Upayavira that it is best to keep Tika in a separate JVM...well, ideally a separate box or rack or even data center [0][1]. :) But seriously, if you're using DIH/SolrCell, you have to configure Tika to parse documents recursively. This was made possible in

Many patterns against many sentences, storing all results

2016-01-05 Thread Will Moy
Hello Please may I have your advice as to whether Solr is a good tool for this job? We have (per year) – Up to 50,000,000 sentences And about 5,000 search patterns (i.e. queries) Our task is to identify all matches between any sentence and any search pattern. That list of detections must be

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Binoy Dalal
If I understand your problem correctly, then you don't want the most frequently used fqs removed and you do not want your filter cache to grow to very large sizes. Well there is already a solution for both of these. In the solrconfig.xml file, you can configure the parameter to suit your needs.

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Matteo Grolla
Hi Binoy, I know these settings but the problem I'm trying to solve is when these settings aren't enough. 2016-01-05 16:30 GMT+01:00 Binoy Dalal : > If I understand your problem correctly, then you don't want the most > frequently used fqs removed and you do not

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Binoy Dalal
What is your exact requirement then? I ask, because these settings can solve the problems you've mentioned without the need to add any additional functionality. On Tue, Jan 5, 2016 at 9:04 PM Matteo Grolla wrote: > Hi Binoy, > I know these settings but the problem

Re: how to search miilions of record in solr query

2016-01-05 Thread Ere Maijala
Well, if you already know that you need to display only the first 20 records, why not only search for them? Or if you don't know whether they already exist, search for, say, a hundred, then thousand and so on until you have enough. Nevertheless, what's really needed for a good answer or ideas

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Matteo Grolla
Hi Erik, the test was done on thousands of queries of that kind and milions of docs I went from <1500 qpm to ~ 6000 qpm on modest virtualized hardware (cpu bound and cpu was scarce) After that customer happy, time finished and didn't go further but definitely cost was something I'd try When I

Re: solr 5.2.0 need to build high query response

2016-01-05 Thread Erick Erickson
Yep. Do note what's happening here. You're executing a query that potentially takes 10 seconds to execute (based on your earlier post). But you may be opening a new searcher every 2 seconds. You may start to see "too many on deck searchers" in your log. If you do do _not_ try to "fix" this by

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Erick Erickson
={!cache=false}n_rea:xxx={!cache=false}provincia:,fq={!cache=false}type: You have a comma in front of the last fq clause, typo? Well, the whole point of caching filter queries is so that the _second_ time you use it, very little work has to be done. That comes at a cost of course for

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Binoy Dalal
@Eric I might be wrong here so please correct me if I am. In the particular case that Matteo has given applying the filters as post won't make any difference since the query is going to return all docs anyways. In such a case won't applying fqs normally be the same as applying them as post

Re: Memory Usage increases by a lot during and after optimization .

2016-01-05 Thread Shawn Heisey
On 1/5/2016 9:59 AM, Zheng Lin Edwin Yeo wrote: > I have uploaded the screenshot here > https://www.dropbox.com/s/l5itfbaus1c9793/Memmory%20Usage.png?dl=0 > > Basically, Java(TM) Platform SE Library, which Solr is running on, is only > using about 22GB currently. However, the memory usage at the

Re: how to search miilions of record in solr query

2016-01-05 Thread Mugeesh Husain
Thanks for your reply @Ere Maijala, one of my eCommerce based client have a requirement to search some of records based on ID's like IP:8083/select?q=ID:(1,4,7,...upto 1 Millions), display only 10 to 20 records. if i use above procedure it takes too much time or if i am going to use

Re: can we use Streaming Expressions for different collection

2016-01-05 Thread Mugeesh Husain
Thanks Joel Bernstein could you share any of link please -- View this message in context: http://lucene.472066.n3.nabble.com/can-we-use-Streaming-Expressions-for-different-collection-tp4248461p4248794.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query behavior difference.

2016-01-05 Thread Modassar Ather
Thanks for your response Ahmet. Best, Modassar On Mon, Jan 4, 2016 at 5:07 PM, Ahmet Arslan wrote: > Hi, > > I think wildcard queries fl:networ* are re-written into Constant Score > Query. > fl=*,score should returns same score for all documents that are retrieved. >

RE: How to use DocValues with TextField

2016-01-05 Thread Alok Bhandari
Thanks Markus. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-use-DocValues-with-TextField-tp4248647p4248797.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to search miilions of record in solr query

2016-01-05 Thread Mugeesh Husain
@Erick Erickson thanks for reply, Actually they give me only this task to search 1 millions ID's with good performance ,result should be appear within 50-100ms. Yeah i will fire off the full query (up to millions) in the background, but how what is the efficient way of doing it in term of

Re: How to use DocValues with TextField

2016-01-05 Thread Alok Bhandari
Thanks Erick. Yes I was not clear in questioning but I want it to be searchable on TextField. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-use-DocValues-with-TextField-tp4248647p4248796.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to search miilions of record in solr query

2016-01-05 Thread Erick Erickson
Well, you're serving the first set of results very quickly because you're only looking for, say, the first 1,000. Thereafter you assemble the rest of the result set in the background (and I'd use the export function) to have your app have the next N ready for immediate response to the user. But

Re: Memory Usage increases by a lot during and after optimization .

2016-01-05 Thread Zheng Lin Edwin Yeo
Hi Shawn, Here is the new screenshot of the Memory tab of the Resource Monitor. https://www.dropbox.com/s/w4bnrb66r16lpx1/Resource%20Monitor.png?dl=0 Yes, I found that the value under the "Working Set" column is much higher than the others. Also, the value which I was previously looking at under

Performance of stats=true={!cardinality=1.0}fl

2016-01-05 Thread Modassar Ather
Hi, *q=fl1:net*=fl=50=true={!cardinality=1.0}fl* is returning cardinality around 15 million. It is taking around 4 minutes. Similar response time is seen with different queries which yields high cardinality. Kindly note that the cardinality=1.0 is the desired goal. Here in the above example the

Re: how to search miilions of record in solr query

2016-01-05 Thread Ere Maijala
You might get better answers if you'd describe your use-case. If, for instance, you know all the IDs and you just need to be able to display a hundred records among those millions quickly, it would make sense to search for only a chunk of 100 IDs at a time. If you need to support more search

Re: can we use Streaming Expressions for different collection

2016-01-05 Thread Joel Bernstein
There are a number of map/reduce join implementations available in Trunk. These are map/reduce joins where the entire result sets are shuffled to worker nodes. All of this code is in the org.apache.solr.client.solrj.io.stream package if you'd like to review. Joel Bernstein

Re: Solr 6 Distributed Join

2016-01-05 Thread Akiel Ahmed
Hi Joel, Sorry there was an error between my chair and keyboard; there isn't a bug - the right hand stream was not ordered by the joined-on field. So, the following query does what I expected: http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted

Re: Solr 6 Distributed Join

2016-01-05 Thread Dennis Gove
Akiel, https://issues.apache.org/jira/browse/SOLR-7554 added checks on the sort with streams, where required. If a particular stream requires that incoming streams be ordered in a compatible way then that check will be performed during creation of the stream and an error will be thrown if that

Re: enable disable filter query caching based on statistics

2016-01-05 Thread Erick Erickson
Binoy: bq: In such a case won't applying fqs normally be the same as applying them as post filters Certainly not, at least AFAIK... By definition, regular FQs are calculated over the entire corpus (not, NOT just the docs that satisfy the query). Then that entire bitset is stored in the

Re: Data migration from one collection to the other collection

2016-01-05 Thread Walter Underwood
You could send the documents to both and filter out the recent ones in the history collection. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 5, 2016, at 5:46 AM, vidya wrote: > > Hi > > I would like to maintain two

Re: Field Size per document in Solr

2016-01-05 Thread KNitin
I want to get the field size (in kb or mb) as is It is stored on disk. That approach might not give that info. On Monday, January 4, 2016, Upayavira wrote: > > Solr does store the term positions, but you won't find it easy to > extract them, as they are stored against terms

Re: Field Size per document in Solr

2016-01-05 Thread Upayavira
The field is not stored in a discrete place, rather it is mixed up with all other field/document data. Therefore, I would suggest that attempting to discern the disk space consumed by a single field would be a futile endeavour. Upayavira On Tue, Jan 5, 2016, at 12:04 PM, KNitin wrote: > I want

solr 5.2.0 need to build high query response

2016-01-05 Thread Novin Novin
Hi guys, I'm having trouble to figure what would be idle solr config for where: I'm doing hard commit in every minute for very few number of users because I have to show those docs in search results quickly when user save the changes. It is causing the response in around 2 secs to show even

Re: solr 5.2.0 need to build high query response

2016-01-05 Thread davidphilip cherian
You should use solr softcommit for this use case. So, by setting softcommit to 5 seconds and autoCommit to minute with openSearcher=false should do the work. 6 false 2000 Reference link- https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching To know more about