Re: can we use Streaming Expressions for different collection

2016-01-04 Thread Joel Bernstein
Can you describe your use case in more detail? In general Streaming Expressions can be used to combine data streams (searches) from different collections. There is a limited set of Streaming Expressions described on the ( https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions),

Re: Running Lucene/SOR on Hadoop

2016-01-04 Thread Tim Williams
Apache Blur (Incubating) has several approaches (hive, spark, m/r) that could probably help with this ranging from very experimental to stable. If you're interested, you can ask over on blur-u...@incubator.apache.org ... Thanks, --tim On Fri, Dec 25, 2015 at 4:28 AM, Dino Chopins

Re: shard lost - solr5.3

2016-01-04 Thread GOURAUD Emmanuel
hi there replying to myself i have set the replica property "preferredLeader" on this shard, shut down all replica for this shard and started only the "preferred" one, this forced an election and save my "ops" night and my new year party!! cheers, Emmanuel De: "GOURAUD Emmanuel"

how to search miilions of record in solr query

2016-01-04 Thread Mugeesh Husain
hi, I have a requirement to search ID field values like Id:(2,3,6,7 upto millions), in which query parser i should to write the result should be display within a 50 ms. Please suggest me which query parser i should use for above search. -- View this message in context:

Query behavior difference.

2016-01-04 Thread Modassar Ather
Hi, Kindly help me understand how will relevance ranking differ int following searches. query : fl:network query : fl:networ* What I am observing that the results returned are different in both of them in a way that the top documents returned for q=fl:network is not present in the top results

Does soft commit re-opens searchers in disk?

2016-01-04 Thread Gili Nachum
Hello, When a new document is added, it becomes visible after a soft commit, during which it is written to a Lucene RAMDirectory (in heap). Then after a hard commit, the RAMDirectory is removed from memory and the docs are written to the index on disk. What happens if I hard commit (write to

Re: how to search miilions of record in solr query

2016-01-04 Thread Upayavira
This is not a use-case to which Lucene lends itself. However, if you must, I would try the terms query parser, which I believe is used like this: {!terms f=id}2,3,6,7 Upayavira On Mon, Jan 4, 2016, at 10:41 AM, Mugeesh Husain wrote: > hi, > > I have a requirement to search ID field values like

Re: Query behavior difference.

2016-01-04 Thread Ahmet Arslan
Hi, I think wildcard queries fl:networ* are re-written into Constant Score Query. fl=*,score should returns same score for all documents that are retrieved. Ahmet On Monday, January 4, 2016 12:22 PM, Modassar Ather wrote: Hi, Kindly help me understand how will

Re: Custom auth plugin not loaded in SolrCloud

2016-01-04 Thread tine-2
Hi, are there any news on this? Was anyone able to get it to work? Cheers, tine -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-auth-plugin-not-loaded-in-SolrCloud-tp4245670p4248340.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Memory Usage increases by a lot during and after optimization .

2016-01-04 Thread Toke Eskildsen
On Mon, 2016-01-04 at 10:05 +0800, Zheng Lin Edwin Yeo wrote: > A) Before I start the optimization, the server's memory usage > is consistent at around 16GB, when Solr startsup and we did some searching. How do you read this number? > However, when I click on the optimization button, the memory

Hard commits, soft commits and transaction logs

2016-01-04 Thread Clemens Wyss DEV
[Happy New Year to all] Is all herein https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ mentioned/recommended still valid for Solr 5.x? - Clemens

Re: Memory Usage increases by a lot during and after optimization .

2016-01-04 Thread Shawn Heisey
On 1/3/2016 7:05 PM, Zheng Lin Edwin Yeo wrote: > A) Before I start the optimization, the server's memory usage > is consistent at around 16GB, when Solr startsup and we did some searching. > However, when I click on the optimization button, the memory usage > increases gradually, until it reaches

Solr suggest, auto complete & spellcheck

2016-01-04 Thread Steven White
Hi, I'm trying to understand what are the differences between Solr suggest, auto complete & spellcheck? Isn't each a function of the UI? If not, can you provide me with links that show end-to-end example setting up Solr to get all of the 3 features? I'm on Solr 5.2. Thanks Steve

Re: Facet shows deleted values...

2016-01-04 Thread Don Bosco Durai
Tomás, thanks for the suggestion. facet.mincount will solve my issue. Erick, I am using SolrCloud with solrconfig.xml configured with autoCommit. And I also read somewhere that explicit commit is not recommended in SolrCloud mode. Regarding auto warm, my server has/was been running for a

Re: how to search miilions of record in solr query

2016-01-04 Thread Upayavira
Yes, because only a small portion of that 250ms is spent in the query parser. Most of it, i would suggest, is spent retrieving and merging posting lists. In an inverted index (which Lucene is), you store the list of documents matching a term against that term - that is your postings list. When

Field Size per document in Solr

2016-01-04 Thread KNitin
Hi, I want to get the size of individual fields per document (or per index) in solrcloud. Is there a way to do this using exiting solr or lucene api? *Use case*: I have a few dynamic fields which may or may not be populated everyday depending on certain conditions. I also do faceting and some

Re: Field Size per document in Solr

2016-01-04 Thread Upayavira
Solr does store the term positions, but you won't find it easy to extract them, as they are stored against terms not fields. Your best bet is to index field lengths into Solr alongside the field values. You could use an UpdateProcessor to do this if you want to do it in Solr. Upayavira On

Re: Solr suggest, auto complete & spellcheck

2016-01-04 Thread Erick Erickson
Here's a writeup on suggester: https://lucidworks.com/blog/2015/03/04/solr-suggester/ The biggest difference is that spellcheck returns individual _terms_ whereas suggesters can return entire fields. Neither are "a function of the UI" any more than searching is a function of the UI. In both

Re: Facet shows deleted values...

2016-01-04 Thread Shawn Heisey
On 1/4/2016 4:11 PM, Don Bosco Durai wrote: > Erick, I am using SolrCloud with solrconfig.xml configured with autoCommit. > And I also read somewhere that explicit commit is not recommended in > SolrCloud mode. Regarding auto warm, my server has/was been running for a > while. Since 4.0,

Re: Facet shows deleted values...

2016-01-04 Thread Erick Erickson
bq: And I also read somewhere that explicit commit is not recommended in SolrCloud mode Not quite, it's just easy to have too many commits happen too frequently from multiple indexing clients. It's also rare that the benefits of the clients issuing commits outweighs the chance of getting it

Re: apply document filter to solr index

2016-01-04 Thread Alexandre Rafalovitch
Well, you have a crawling and extraction pipeline. You can probably inject a classification algorithm somewhere in there, possibly NLP trained on manual seed. Or just a list of typical words as a start. This is kind of pre-Solr stage though. Regards, Alex On 4 Jan 2016 7:37 pm,

[Manual Sharding] Solr distrib search cause thread exhaustion

2016-01-04 Thread Alessandro Benedetti
Hi guys, this is the scenario we are studying : Solr 4.10.2 16 shards, a solr instance aggregating the results running a distrib query with shards=. ( all the shards) . Currently we are not using shards.tolerant=true, so we throw an exception on error. We are in a situation when a shard is

Re: apply document filter to solr index

2016-01-04 Thread Binoy Dalal
There is no way that you can do that in solr. You'll have to write something at the app level, where you're crawling your docs or write a custom update handler that will preprocess the crawled docs and throw out the irrelevant ones. One way you can do that is look at the doc title and the url

Re: Does soft commit re-opens searchers in disk?

2016-01-04 Thread Daniel Collins
If you have already done a soft commit and that opened a new searcher, then the document will be visible from that point on. The results returned by that searcher cannot be changed by the hard commit (whatever that is doing under the hood, the segment that has that document in must still be

MapReduceIndexerTool Indexing

2016-01-04 Thread vidya
Hi I have used MapReduceIndexerTool to index data in my hdfs to solr inorder to search it. I want to know whether it indexes entire data when some new data is added to that path, again when tool is run on it. Thanks in advance -- View this message in context:

apply document filter to solr index

2016-01-04 Thread liviuchristian
Hi everyone, I'm working on a search engine based on solr which indexes documents from a large variety of websites.  The engine is focused on cook recipes. However, one problem is that these websites provide not only content related to cooking recipes but also content related to: fashion,

Re: how to search miilions of record in solr query

2016-01-04 Thread Mugeesh Husain
>>This is not a use-case to which Lucene lends itself. However, if you >>must, I would try the terms query parser, which I believe is used like >>this: {!terms f=id}2,3,6,7 I did try terms query parser like above, but the problem is performance, i am getting result 250ms but i am looking for a

Re: Does soft commit re-opens searchers in disk?

2016-01-04 Thread Emir Arnautovic
Hi Gili, Visibility is related to searcher - if you reopen searcher it will be visible. If hard commit happens without reopening searcher, documents will not be visible till next soft commit happens. You can find more details about commits on

Querying with action parameter included in URL

2016-01-04 Thread vidya
Hi I am pretty new to solr and when i am going through the tutorials , I came across urls for querying like "http://localhost:8983/solr/admin/configs?action=CREATE=booksConfig=genericTemplate; . I wanted to know how to implement the same by doing changes in schema.xml or solrconfig.xml. Where

Re: MapReduceIndexerTool Indexing

2016-01-04 Thread vidya
Hi I would like to index only new data but not already indexed data(delta Indexing). how can i achieve it using MRIT. Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/MapReduceIndexerTool-Indexing-tp4248387p4248573.html Sent from the Solr - User mailing

Re: Querying with action parameter included in URL

2016-01-04 Thread Binoy Dalal
I think that all this will do is create a config file with the name booksConfig based on a template. This and other calls like these are solr's core admin api calls that you make through http requests. You don't need to make any changes to your schema or solrconfig files in order to execute such

Re: Querying with action parameter included in URL

2016-01-04 Thread davidphilip cherian
Hi Vidya, I think you are confused with solr search queries/requests with solr other restful apis to perform CRUD operations on collections. Sample of search queries are list here with standard query parser : https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser Solr

Re: Multiple solr instances on one server

2016-01-04 Thread Jack Krupansky
See the Solr Reference Guide: " -s Sets the solr.solr.home system property; Solr will create core directories under this directory. This allows you to run multiple Solr instances on the same host while reusing the same server directory set using the -d parameter. If set, the specified directory

Multiple solr instances on one server

2016-01-04 Thread philippa griggs
Hello, (Solr 5.2.1) I'm wanting to run multple solr instances on one server, does anyone know which is better- allowing each solr instance to use their own internal jetty or to install jetty on the server? Many thanks Philippa

Re: Multiple solr instances on one server

2016-01-04 Thread philippa griggs
Hello, Thanks for your reply. Do you know if there are many disadvantages to running multiple solr instances all running their own internal jetty. I'm trying to work out if this would work or if I would need to install jetty myself on the machine and use that instead. I'm not sure how many

Query regarding AnalyzingInfixLookupFactory

2016-01-04 Thread radhika tayal
Hi, I am trying to use AnalyzingInfixLookupFactory for Auto-suggest. But I am facing one issue related to duplicate result. Below is exact problem i am facing A lot of the fields I am capturing (multivalue) contains data that are repeated (eg. new york exists in the title fields of many

Re: Multiple solr instances on one server

2016-01-04 Thread Mugeesh Husain
you could start solr with multiple port like below bin/solr start -p 8983 one instance bin/solr start -p 8984 second instance and so its depend on you -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-solr-instances-on-one-server-tp4248411p4248413.html Sent from

Re: Multiple solr instances on one server

2016-01-04 Thread Mugeesh Husain
you could use inbuilt(internal) jetty in the production, its depend on requirement. if you want to use other container, tomcat would be the best. Elaborate your requirement Please why you want to use multiple instance in a single server ? -- View this message in context:

Re: [Manual Sharding] Solr distrib search cause thread exhaustion

2016-01-04 Thread Alessandro Benedetti
Yes Erick, our jetty is configured with a 10.000 threads. Actually the puzzle got more complicated as we realised the connTimeout by default is set to 0. But we definetely get an error from one of the shards and the aggregator throw the exception because not tolerant. The weird thing is that the

Re: how to search miilions of record in solr query

2016-01-04 Thread Erick Erickson
Best of luck with that ;). 250ms isn't bad at all for "searching millions of IDs". Frankly, I'm not at all sure where I'd even start. With millions of search terms, I'd have to profile the application to see where it was spending the time before even starting. Best, Erick On Mon, Jan 4, 2016 at

Re: MapReduceIndexerTool Indexing

2016-01-04 Thread Erick Erickson
Yes it does. MRIT is intended for initial bulk loads. It takes whatever it's pointed at and indexes it. Additionally, it does not update documents. If the same document (by ID) is indexed twice, you'll wind up with two copies in your results. Best, Erick On Mon, Jan 4, 2016 at 5:00 AM, vidya

can we use Streaming Expressions for different collection

2016-01-04 Thread Mugeesh Husain
I am checking the arcticle this-> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions Can we implement merge operation for different collection or different node in solrcloud -- View this message in context:

Re: [Manual Sharding] Solr distrib search cause thread exhaustion

2016-01-04 Thread Erick Erickson
How many threads are you allocating for the servlet container? 10,000 is the "usual" number. Best, Erick On Mon, Jan 4, 2016 at 5:21 AM, Alessandro Benedetti wrote: > Hi guys, > this is the scenario we are studying : > > Solr 4.10.2 > 16 shards, a solr instance

Re: shard lost - solr5.3

2016-01-04 Thread Erick Erickson
There's no reason to shut down your node. You should be able to issue a REBALANCELEADERS command, see: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders on a currently-running cluster and all your preferred leaders (assuming the nodes are up) should

Re: Hard commits, soft commits and transaction logs

2016-01-04 Thread Erick Erickson
As far as I know. If you see anything different, let me know and we'll see if we can update it. Best, Erick On Mon, Jan 4, 2016 at 1:34 AM, Clemens Wyss DEV wrote: > [Happy New Year to all] > > Is all herein >

Re: Multiple solr instances on one server

2016-01-04 Thread philippa griggs
We store a huge amount of data across 10 shards and are getting to a point where we keep having to up the heap to stop solr from crashing. We are trying to keep the heap size down, and plan to to host multiple solr instances on each server which will have a much smaller heap size.

Re: Multiple solr instances on one server

2016-01-04 Thread Erick Erickson
Right, that's the most common reason to run multiple JVMs. You must be running multiple replicas on each box though to make that viable. By running say 2 JVMS, you're essentially going from hosting, say, 4 replicas in one JVM to 2 replicas in each of 2 JVMs. You'll incur some overhead due to the