Indicating missing query terms in response

2020-11-08 Thread adfel70
As Solr query result set may contain documents that does not include all search terms, we were wondering if it is possible to get indication what terms were missing as part of the response. For example, if our index has the following indexed doc: { "title": "hello" } (assuming

Re: Potential authorization bug when making HTTP requests

2019-05-15 Thread adfel70
Opened SOLR-13472 -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Potential authorization bug when making HTTP requests

2019-05-04 Thread adfel70
Hi Jan, Thanks for the reply. I am not sure it is exactly the same issue, also we are testing with Solr 7.7.1 and issue still occurs. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Potential authorization bug when making HTTP requests

2019-05-02 Thread adfel70
Atuhorization bug (?) when making HTTP requests We are experiencing a problem when making HTTP requests to a cluster with authorization plugin enabled. Permissions are configured in security.json the following: { ... authentication_settings ... "authorization":{

Solr failed to start after configuring Kerberos authentication

2018-05-24 Thread adfel70
Hi, We are trying to configure Kerberos auth for Solr 6.5.1. We went over the steps as described through Sorl’s ref guide, but after restart we are getting the following error: org.apache.zookeeper.client.ZookeeperSaslClient; An error: (java.security.PrivilegedActionException:

Do streaming expressions support range facets?

2017-04-03 Thread adfel70
Specifically date ranges? I would like to perform some kind of OLAP cube on the data in solr, and looking at streaming expressions for this. -- View this message in context: http://lucene.472066.n3.nabble.com/Do-streaming-expressions-support-range-facets-tp4328233.html Sent from the Solr -

Streaming expressions - Any plans to add one to many fetches to the fetch decorator?

2017-03-27 Thread adfel70
Any ideas how to workaround this with the current streaming capabilities? -- View this message in context: http://lucene.472066.n3.nabble.com/Streaming-expressions-Any-plans-to-add-one-to-many-fetches-to-the-fetch-decorator-tp4326989.html Sent from the Solr - User mailing list archive at

Streaming expressions and result transfomers

2017-03-26 Thread adfel70
Hi does streaming expressions support doc transformers? To be more specific, I have a nested docs data model. I want to use streaming expressions and get the results with ChildDocTransformerFactory. Is it possible? -- View this message in context:

Re: Simple sql query with where clause doesn't work

2017-03-12 Thread adfel70
Seems like this only happend when the value is not a number curl --data-urlencode 'stmt=select fieldA from collection where field='123'' http://host:port/solr/collection/sql?aggregationMode=facet works. while this one doesnt work: curl --data-urlencode 'stmt=select fieldA from collection where

Simple sql query with where clause doesn't work

2017-03-12 Thread adfel70
Hi I'm trying to play with /sql feature. working with solr 6.4.2 running curl --data-urlencode 'stmt=select fieldA from collection' http://host:port/solr/collection/sql?aggregationMode=facet work fine. running curl --data-urlencode 'stmt=select fieldA from collection where fieldB='value''

Re: reindexing a solr collection of nested documents

2016-11-29 Thread adfel70
Anyone has a clue? -- View this message in context: http://lucene.472066.n3.nabble.com/reindexing-a-solr-collection-of-nested-documents-tp4307586p4307976.html Sent from the Solr - User mailing list archive at Nabble.com.

reindexing a solr collection of nested documents

2016-11-27 Thread adfel70
Hi I have a solr collection of nested documents. I would like to reindex this collection to a new collection ,without running the original process that created this collection. If this was not a a collection of nested documents, I would use the /export handler to export all the documents and

Executing Collector's Collect method on more than one thread

2016-01-31 Thread adfel70
I am using RankQuery to implement my applicative scorer that returns a score based on the value of specific field (lets call it 'score_field') that is stored for every document. The RankQuery creates a collector, and for every collected docId I retrieve the value of score_field, calculate the

Re: Read time out exception - exactly 10 minutes after starting committing

2016-01-27 Thread adfel70
I don't have any custom ShardHandler Regarding the cache, I reduced it to zero, and checking performance now Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Read-time-out-exception-exactly-10-minutes-after-starting-committing-tp4252287p4253568.html Sent from the

Re: Read time out exception - exactly 10 minutes after starting committing

2016-01-23 Thread adfel70
Thanks Shawn, 1. I am getting the "read time out" from the Solr Server. Not from my client, but from the server client when it tries to reach other instances while committing. 2. I reduced the filter cache autowarmCount to 512, and seems to fix the problem. It now takes only several seconds to

Read time out exception - exactly 10 minutes after starting committing

2016-01-21 Thread adfel70
I am running soft commit on 100 solr docs (the index itself has 3 Billion docs). After EXACTLY 10 minutes (for example, start committing on 15:52:55.932, exception on 16:02:55.976) I am getting several exception of the sort: org.apache.solr.client.solrj.SolrServerException: Timeout occured while

Re: CloudSolrCloud - Commit returns but not all data is visible (occasionally)

2015-11-24 Thread adfel70
After several days running some use cases with 2 configurations, I can tell you that the "PERFORMANCE WARNING: Overlapping onDeckSearchers" continues only on the maxWarmingSearchers=2 and none on the maxWarmingSearchers=5 config. Unfortunately, the root problem still occurs! I have reduced the

Re: CloudSolrCloud - Commit returns but not all data is visible (occasionally)

2015-11-17 Thread adfel70
Thanks Eric, I'll try to play with the autowarm config. But I have a more direct question - why does the commit return without waiting till the searchers are fully refreshed? Could it be that the parameter waitSearcher=true doesn't really work? or maybe I don't understand something here...

CloudSolrCloud - Commit returns but not all data is visible (occasionally)

2015-11-16 Thread adfel70
Hi, I am using Solr 5.2.1 with the solrj client 5.2.1. (I know CloudSolrCloud is deprecated) I am running the command: *cloudSolrServer.commit(false, true, true)* the parameters are: waitFlush (false), waitSearcher (true), softCommit (true) The problem is that the client returns as if it already

Re: Solr facets implementation question

2015-09-17 Thread adfel70
Toke Eskildsen wrote > adfel70 > adfel70@ > wrote: >> I am trying to understand why faceting on a field with lots of unique >> values >> has a great impact on query performance. > > Faceting in Solr is performed in different ways. String faceting different &g

Solr facets implementation question

2015-09-08 Thread adfel70
I am trying to understand why faceting on a field with lots of unique values has a great impact on query performance. Since Googling for Solr facet algorithm did not yield anything, I looked how facets are implemented in Lucene. I found out that there are 2 methods - taxonomy-based and

Re: serious data loss bug in correlation with too much data after closed

2015-08-13 Thread adfel70
Update: modifying jetty.xml to what said here: http://lucene.472066.n3.nabble.com/Too-much-data-after-closed-for-HttpChannelOverHttp-td4170459.html solved the problem of these warnings and the dataloss. future searches of this problem should take into account that this warning may imply of

Re: serious data loss bug in correlation with too much data after closed

2015-08-09 Thread adfel70
By now I'm pretty much sure that this is either a bug in solr or in http-client. I again reproduced the problem: 1. during massive indexing we see some WARNINGS from HttpParser: badMessage: java.lang.IllegalStateException: too much data after closed for HttpChannelOverHttp checking in httpcore

Re: serious data loss bug in correlation with too much data after closed

2015-08-06 Thread adfel70
, Aug 4, 2015, at 03:06 PM, adfel70 wrote: Hello, I'm using solr 5.2.1 I'm running indexing of a collection with 20 shards. around 1.7 billion docs should be indexed. the indexer is a mapreduce job that runs on yarn, running 60 concurrent containers. I index with bulks of 1000 docs and write

Re: serious data loss bug in correlation with too much data after closed

2015-08-06 Thread adfel70
Heisey-2 wrote On 8/4/2015 8:06 AM, adfel70 wrote: I saw this post: http://lucene.472066.n3.nabble.com/Too-much-data-after-closed-for-HttpChannelOverHttp-td4170459.html I tried reducing the bulk size from 1000 to 200 as the post suggests (didn't go to runing each doc in a seperate .add call

serious data loss bug in correlation with too much data after closed

2015-08-04 Thread adfel70
Hello, I'm using solr 5.2.1 I'm running indexing of a collection with 20 shards. around 1.7 billion docs should be indexed. the indexer is a mapreduce job that runs on yarn, running 60 concurrent containers. I index with bulks of 1000 docs and write logs for each bulk that was indexed. each such

Re: mapreduce job using soirj 5

2015-06-17 Thread adfel70
We cannot downgrade httpclient in solrj5 because its using new features and we dont want to start altering solr code, anyway we thought about upgrading httpclient in hadoop but as Erick said its sounds more work than just put the jar in the data nodes. About that flag we tried it, hadoop even has

mapreduce job using soirj 5

2015-06-16 Thread adfel70
Hi, We recently started testing solr 5, our indexer creates mapreduce job that uses solrj5 to index documents to our SolrCloud. Until now, we used solr 4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5. The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed with

Re: Adding applicative cache to SolrSearcher

2015-06-11 Thread adfel70
Works great, thanks guys! Missed the leafReader because I looked at IndexSearcher instead of SolrIndexSearcher... -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-applicative-cache-to-SolrSearcher-tp4211012p4211183.html Sent from the Solr - User mailing list archive

DocValues memory consumption thoughts

2015-06-11 Thread adfel70
I am using DocValues and I am wondering how to configure Solr's processes java's heap size: does DocValues uses system cache (off heap memory) or heap memory? should I take DocValues into consideration when I calculate heap parameters (xmx, xmn, xms...)? -- View this message in context:

Adding applicative cache to SolrSearcher

2015-06-10 Thread adfel70
I am using RankQuery to implement my applicative scorer that returns a score based on the value of specific field (lets call it 'score_field') that is stored for every document. The RankQuery creates a collector, and for every collected docId I retrieve the value of score_field, calculate the

Re: How to tell when Collector finishes collect loop?

2015-06-10 Thread adfel70
I need to execute close() because the scorer is being opened in a context of a query and caches some data in that scope - of the specific query. The way to clear this cache, which is only relevant for that query, is to call close(). I think this API is not so good, but I assume that the scorer's

How to tell when Collector finishes collect loop?

2015-06-03 Thread adfel70
Hi guys, need your help (again): I have a search handler which need to override solr's scoring. I chose to implement it with RankQuery API, so when getTopDocsCollector() gets called it instantiates my TopDocsCollector instance, and every dicId gets its own score: public class MyScorerrankQuet

Re: Native library of plugin is loaded for every core

2015-05-28 Thread adfel70
Works as expected :) Thanks guys! -- View this message in context: http://lucene.472066.n3.nabble.com/Native-library-of-plugin-is-loaded-for-every-core-tp4207996p4208372.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Native library of plugin is loaded for every core

2015-05-27 Thread adfel70
Hi Alan, thanks for the reply. I am not sure what did you mean. Currently it is loaded from solrconfig.xml lib dir =/path_to_plug_dir/, regex=*.jar / Is there any other way? -- View this message in context:

Native library of plugin is loaded for every core

2015-05-27 Thread adfel70
Hi guys, need your help: I added a custom plugins to Solr, to support my applicative needs (one index handler and 2 search components), all of them access a native library using JNI. The native library wrapper class loads the library using the regular pattern: public class YWrapper{

Re: getting frequent CorruptIndexException and inconsistent data though core is active

2015-05-07 Thread adfel70
Anyone has any inputs on this? -- View this message in context: http://lucene.472066.n3.nabble.com/getting-frequent-CorruptIndexException-and-inconsistent-data-though-core-is-active-tp4204129p4204347.html Sent from the Solr - User mailing list archive at Nabble.com.

I was asked to wait on state recovering for shard.... but I still do not see the request state

2015-05-07 Thread adfel70
Hi I have a cluster of 16 shards, 3 replicas. I keep getting situations where a whole shard breaks. the leader is at down state and says: I was asked to wait on state recovering for shard but i still do not see the requested state. I see state: recovering live:true leader from ZK:http://...

Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread adfel70
1.7.75 2. My previous post was about another use case, but nevertheless I have configured docvalues in the faceted fields. Toke Eskildsen wrote On Wed, 2015-05-06 at 00:58 -0700, adfel70 wrote: each shard has around 200 million docs. size of each shard is 250GB. this runs on 12 machines. each

severe problems with soft and hard commits in a large index

2015-05-06 Thread adfel70
Hello I have a cluster of 16 shards, 3 replicas. the cluster indexed nested documents. it currently has 3 billion documents overall (parent and children). each shard has around 200 million docs. size of each shard is 250GB. this runs on 12 machines. each machine has 4 SSD disks and 4 solr

Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread adfel70
Xms16gb Xmn28gb any input on this? How many documents per shard are recommended? Note that I use nested documents. total collection size is 3 billion docs, number of parent docs is 600 million. the rest are children. Shawn Heisey-2 wrote On 5/6/2015 1:58 AM, adfel70 wrote: I have a cluster

Re: severe problems with soft and hard commits in a large index

2015-05-06 Thread adfel70
, 2015 at 10:58 AM, adfel70 lt; adfel70@ gt; wrote: Hello I have a cluster of 16 shards, 3 replicas. the cluster indexed nested documents. it currently has 3 billion documents overall (parent and children). each shard has around 200 million docs. size of each shard is 250GB. this runs

getting frequent CorruptIndexException and inconsistent data though core is active

2015-05-06 Thread adfel70
Hi I'm getting org.apache.lucene.index.CorruptIndexException liveDocs.count()=2000699 info.docCount()=2047904 info.getDelCount()=47207 (filename=_ney_1g.del). This just happened for the 4th time in 2 weeks. each time this happens in another core, usually when a replica tries to recover, then it

Re: CLUSTERSTATE timeout

2015-04-14 Thread adfel70
I'm having the same issue with 4.10.3 I'm performing various task on clusterstate API and getting random timeouts throguhout the day. -- View this message in context: http://lucene.472066.n3.nabble.com/CLUSTERSTATE-timeout-tp4199367p4199501.html Sent from the Solr - User mailing list archive

Re: Unexplained leader initiated recovery after updates

2015-02-06 Thread adfel70
any inputs on this? i'm facing the same problem.. -- View this message in context: http://lucene.472066.n3.nabble.com/Unexplained-leader-initiated-recovery-after-updates-tp4178496p4184336.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: CLUSTERSTATUS timeout

2014-12-17 Thread adfel70
Hi Jonathan, We are having the exact same problem with Solr 4.8.0. Did you manage to resolve this one? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/CLUSTERSTATUS-timeout-tp4173224p4174741.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Getting the position of a word via Solr API

2014-12-02 Thread adfel70
Small update, I have managed making the Term Vector to work and I am getting all the words of the text field. The problem is that it doesn't work with several words combined, I can't find the offset of the needed expression starts... Any ideas anyone? Thanks! -- View this message in context:

Getting the position of a word via Solr API

2014-12-01 Thread adfel70
Hi, I am trying to retrieve from Solr API the position of a word from a text field that was indexes but not stored. I am storing the text field in an external repository and trying to do the Solr built-in snippet function by myself, outside Solr. Basically, all I need is to get from Solr the

Is it possible to facet on date fields and aggregate by day/month/year?

2014-11-16 Thread adfel70
Hi, If my data includes: doc1: date_f: 2014-05-01T00:00:00Z doc2: date_f: 2014-05-02T00:00:00Z doc2: date_f: 2014-06-01T00:00:00Z doc2: date_f: 2014-07-01T00:00:00Z then I can facet on month(date_f) and get 05(2) 06(1) 07(1) or facet on year(date_f) and get 2014(4) Is it supported? --

out of memory when trying to sort by id in a 1.5 billion index

2014-11-07 Thread adfel70
hi I have 11 machines in my cluster. each machine 128GB memory, 2 solr jvm's with 12gb heap each. cluster has 7 shard, 3 replicas. 1.5 billion docs total. most user queries are pretty simple for now, sorting by date fields and another field the has around 1000 unique values. I have a usecase for

Facets on Nested documents

2014-07-07 Thread adfel70
Hi, I indexed different types(different fields) of child docs for every parent. I want to do facet on field in one type of child doc and after it to do another of facet on different type of child doc. It doesn't work.. Any idea how i can do something like that? thanks. -- View this message

Re: OOM during indexing nested docs

2014-06-25 Thread adfel70
I made two tests, one with MaxRamBuffer=128 and the second with MaxRamBuffer=256. In both i got OOM. I also made two tests on autocommit: one with commit every 5 min, and the second with commit every 100,000 docs. (disabled softcommit) In both i got OOM. merge policy - Tiered (max segment size

OOM during indexing nested docs

2014-06-24 Thread adfel70
Hi, I am getting OOM during indexing 400 million docs (nested 7-20 children). The memory usage gets higher while indexing until it gets to 24g. also after OOM and stop indexing, the memory stays on 24g, *seems like a leak.* *Solr Collection Info: * solr 4.8 , 6 shards, 1 replicas per shard,

Re: Replica as a leader

2014-05-18 Thread adfel70
*one of the most impotent requirements in my system is not to lose docs and not to retrieve part of the data at query time.* I expect the replica to wait until the real leader will start or at least to sync the real leader with the docs indexed in the replica after starting and syncing the

Replica as a leader

2014-05-15 Thread adfel70
/Solr Collection Info:/ Solr 4.8 , 4 shards, 3 replicas per shard, 30-40 million docs per shard. /Process:/ 1. Indexing 100-200 docs per second. 2. Doing Pkill -9 java to 2 replicas (not the leader) in shard 3 (while indexing). 3. Indexing for 10-20 minutes and doing hard commit. 4. Doing Pkill

solr 4.8 Leader Problem

2014-05-12 Thread adfel70
*Solr Collection Info:* Solr 4.8 , 4 shards, 3 replicas per shard, 30-40 million docs per shard. Process: 1. Indexing 100-200 docs per second. 2. Doing Pkill -9 java to 2 replicas (not the leader) in shard 3 (while indexing). 3. Indexing for 10-20 minutes and doing hard commit. 4. Doing Pkill

Replica as a leader

2014-05-11 Thread adfel70
Solr Collection Info: solr 4.8 , 4 shards, 3 replicas per shard, 30-40 milion docs per shard. Process: 1. Indexing 100-200 docs per second. 2. Doing Pkill -9 java to 2 replicas (not the leader) in shard 3 (while indexing). 3. Indexing for 10-20 minutes and doing hard commit. 4. Doing Pkill -9

Suspicious Object.wait in UnInvertedField.getUnInvertedField

2014-04-02 Thread adfel70
While debugging a problem where 400 threads were waiting for a single lock we traced the issue to the getUnInvertedField method. public static UnInvertedField getUnInvertedField(String field, SolrIndexSearcher searcher) throws IOException { SolrCacheString,UnInvertedField cache =

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-18 Thread adfel70
AM, adfel70 wrote: we currently have arround 200gb in a server. I'm aware of the RAM issue, but it somehow doesnt seems related. I would expect search latency problems. not strange eofexceptions. regarding the http.timeout - I didn't change anything concerning this. Do I need to explicitly

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-17 Thread adfel70
than the solr out-of-the-box comes with? I'm also monitoring garbage collector metrics and I don't see anything unsual.. Shawn Heisey-4 wrote On 3/16/2014 10:34 AM, adfel70 wrote: I have a 12-node solr 4.6.1 cluster. each node has 2 solr procceses, running on 8gb heap jvms. each node has

bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-16 Thread adfel70
Hi I have a 12-node solr 4.6.1 cluster. each node has 2 solr procceses, running on 8gb heap jvms. each node has total of 64gb memory. My current collection (7 shards, 3 replicas) has around 500 million docs. I'm performing bulk indexing into the collection. I set softCommit to 10 minutes and

need help in understating solr cloud stats data

2014-02-02 Thread adfel70
I'm sending all solr stats data to graphite. I have some questions: 1. query_handler/select requestTime - if i'm looking at some metric, lets say 75thPcRequestTime - I see that each core in a single collection has different values. Is each value of each core is the time that specific core spent

monitoring solr logs

2013-12-30 Thread adfel70
hi i'm trying to figure out which solr and zookeeper logs i should monitor and collect. All the logs will be written to a file but I want to collect some of them with logstash in order to be able to analyze them efficiently. any inputs on logs of which classes i should collect? thanks. --

RE: monitoring solr logs

2013-12-30 Thread adfel70
levels to send to the RabbitMQ appender. Cheers, Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com From: adfel70 lt; adfel70@ gt; Sent: Monday, December 30, 2013 8:15 AM To: solr-user@.apache Subject: monitoring solr logs

RE: monitoring solr logs

2013-12-30 Thread adfel70
. Software Engineer, LucidWorks www.lucidworks.com From: adfel70 lt; adfel70@ gt; Sent: Monday, December 30, 2013 9:34 AM To: solr-user@.apache Subject: RE: monitoring solr logs Actually I was considering using logstash4solr, but it didn't

problem with facets - out of memory exception

2013-12-19 Thread adfel70
Hi I have a cluster of 14 nodes (7 shards, 2 replicas). each node with 6gb jvm. solr 4.3.0 i have 400 million docs in the cluster, each node around 60gb of index. I index new docs each night, around a million a night. As the index started to grow, i started having problems of OutOfMmemory when

solr cloud - deleting and adding the same doc

2013-12-17 Thread adfel70
Hi in SolrCloud, if I send 2 different requests to solr - one with delete action of doc with id X and another with add action of doc with the same id - is it guaranteed that the delete action will occur before the add action? Is it guaranteed that after all actions are done, the index will have

Re: solr cloud - deleting and adding the same doc

2013-12-17 Thread adfel70
the same client, this is a guarantee. - Mark On Dec 17, 2013, at 9:54 AM, adfel70 lt; adfel70@ gt; wrote: Hi in SolrCloud, if I send 2 different requests to solr - one with delete action of doc with id X and another with add action of doc with the same id - is it guaranteed

Upgrading Solr cluster without downtime

2013-12-01 Thread adfel70
I was wondering if there is a way to upgrade Solr version without downtime. Theoretically it seems possible when every shard in the cluster has at least 2 replicas - but Jetty does not refresh the web container until we delete solr-webapp folder's content. Can someone please share from his

Re: solr as a service for multiple projects in the same environment

2013-11-30 Thread adfel70
The risk is if you buy mistake mess up a cluster while doing maintenance on one of the systems, you can affect the other system. Its a pretty amorfic risk. Aside from having multiple systems share the same hardware resources, I don't see any other real risk. Are your collections share the same

Re: syncronization between replicas

2013-11-27 Thread adfel70
I'm sorry, I forgot to write the problem. adfel70 wrote 1. take one of the replicas of shard1 down(it doesn't matter which one) 2. continue indexing documents(that's important for this scenario) 3. take down the second replica of shard1(now the shard is down and we cannot index anymore) 4

solr as a service for multiple projects in the same environment

2013-11-27 Thread adfel70
Hi I have various solr related projects in a single environment. These project are not related one to another. I'm thinking of building a solr architecture so that all the projects will use different solr collections in the same cluster, as opposed to having a solr cluster for each project. 1.

Re: Setting solr.data.dir for SolrCloud instance

2013-11-26 Thread adfel70
is executing from. For instance, I can start Solr like java -Dsolr.solr.home=/Users/Erick/testdir/solr -jar start.jar and have my war in a completely different place. Best, Erick On Tue, Nov 26, 2013 at 1:08 AM, adfel70 lt; adfel70@ gt; wrote: Thanks for the reply, Erick. Actually, I

Re: syncronization between replicas

2013-11-26 Thread adfel70
anyone? -- View this message in context: http://lucene.472066.n3.nabble.com/syncronization-between-replicas-tp4103046p4103455.html Sent from the Solr - User mailing list archive at Nabble.com.

syncronization between replicas

2013-11-25 Thread adfel70
Hi, We currently running tests on solr to find as many problems in our solr environment so we can be ready for these kind of problems in production, anyway we found an edge case and have few questions about it. We have one collection with two shards, each shard with replica factor 2. we are

Setting solr.data.dir for SolrCloud instance

2013-11-25 Thread adfel70
I found something strange while trying to create more than one collection in SolrCloud: I am running every instance with -Dsolr.data.dir=/data If I look at Core Admin section, I can see that I have one core and its dataDir is set to this fixed location. Problem is, if I create a new collection,

Re: Setting solr.data.dir for SolrCloud instance

2013-11-25 Thread adfel70
Thanks for the reply, Erick. Actually, I didnt not think this through. I just thought it would be a good idea to separate the data from the application code. I guess I'll leave it without setting the datadir parameter and add a symlink. -- View this message in context:

Commit behaviour in SolrCloud

2013-11-24 Thread adfel70
Hi everyone, I am wondering how commit operation works in SolrCloud: Say I have 2 parallel indexing processes. What if one process sends big update request (an add command with a lot of docs), and the other one just happens to send a commit command while the update request is being processed. Is

Re: Commit behaviour in SolrCloud

2013-11-24 Thread adfel70
Hi Mark, Thanks for the answer. One more question though: You say that if I get a success from the update, it’s in the system, commit or not. But when exactly do I get this feedback - Is it one feedback per the whole request, or per one add inside the request? I will give an example clarify my

Re: Commit behaviour in SolrCloud

2013-11-24 Thread adfel70
been indexed when the soft commit happens. - Mark On Nov 25, 2013, at 1:03 AM, adfel70 lt; adfel70@ gt; wrote: Hi Mark, Thanks for the answer. One more question though: You say that if I get a success from the update, it’s in the system, commit or not. But when exactly do I get

Question regarding possibili

2013-11-19 Thread adfel70
Hi, we plan to establish an ensemble of solr with zookeeper. We gonna have 6 solr servers with 2 instances on each server, also we'll have 6 shards with replication factor 2, in addition we'll have 3 zookeepers. Our concern is that we will send documents to index and solr won't index them but

Question regarding possibility of data loss

2013-11-19 Thread adfel70
Hi, we plan to establish an ensemble of solr with zookeeper. We gonna have 6 solr servers with 2 instances on each server, also we'll have 6 shards with replication factor 2, in addition we'll have 3 zookeepers. Our concern is that we will send documents to index and solr won't index them but

Re: solrcloud shards backup/restoration

2013-11-07 Thread adfel70
did you solve this eventually? Aditya Sakhuja wrote How does one recover from an index corruption ? That's what I am trying to eventually tackle here. Thanks Aditya On Thursday, September 19, 2013, Aditya Sakhuja wrote: Hi, Sorry for the late followup on this. Let me put in more

Re: Soft commit and flush

2013-10-07 Thread adfel70
I understand the bottom line that soft commits are about visibility, hard commits are about durability. I am just trying to gain better understanding what happens under the hood... 2 more related questions you made me think of: 1. Does the NRTCachingDirectoryFactory relevant for both types of

Re: Soft commit and flush

2013-10-07 Thread adfel70
Sorry, by OOE I meant Out of memory exception... -- View this message in context: http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093902.html Sent from the Solr - User mailing list archive at Nabble.com.

solr cpu usage

2013-10-01 Thread adfel70
hi We're building a spec for a machine to purchase. We're going to buy 10 machines. we aren't sure yet how many proccesses we will run per machine. the question is -should we buy faster cpu with less cores or slower cpu with more cores? in any case we will have 2 cpus in each machine. should we

Re: Maximum solr processes per machine

2013-09-30 Thread adfel70
Bram Van Dam wrote On 09/29/2013 04:03 PM, adfel70 wrote: If you're doing real time on a 5TB index then you'll probably want to throw your money at the fastest storage you can afford (SSDs vs spinning rust made a huge difference in our benchmarks) and the fastest CPUs you can get your

Maximum solr processes per machine

2013-09-29 Thread adfel70
Hi, I'm thinking of solr cluster architecture before purchasing machines. My total index size is around 5TB. I want to have replication factor of 3. total 15TB. I've understood that I should have 50-100% of the index size as ram, for OS cache. Lets say we're talking about around 10TB of memory.

Re: Maximum solr processes per machine

2013-09-29 Thread adfel70
for disk access. FWIW, Erick On Sun, Sep 29, 2013 at 9:21 AM, adfel70 lt; adfel70@ gt; wrote: Hi, I'm thinking of solr cluster architecture before purchasing machines. My total index size is around 5TB. I want to have replication factor of 3. total 15TB. I've understood that I should

Soft commit and flush

2013-09-24 Thread adfel70
I am struggling to get a deep understanding of soft commit. I have read Erick's post http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ which helped me a lot with when and why we should call each type of commit. But still, I cant understand what

using tika inside SOLR vs using nutch

2013-09-10 Thread adfel70
Hi What are the pros and cons of both use cases? 1. use nutch to crawl file system + parse files + perform other data manipulation and eventually index to solr. 2. use solr dataimporthandlers and plugins in order to perform this task. Note that I have tens of millions of docs which I need to

Question about SOLR-5017 - Allow sharding based on the value of a field

2013-08-28 Thread adfel70
Hi I'm looking into allowing query joins in solr cloud. This has the limitation of having to index all the documents that are joineable together to the same shard. I'm wondering if SOLR-5017 https://issues.apache.org/jira/browse/SOLR-5017 would give me the ability to do so without implementing

What do you use for solr's logging analysis?

2013-08-11 Thread adfel70
Hi I'm looking at a tool that could help me perform solr logging analysis. I use SolrCloud on multiple servers, so the tool should be able to collect logs from multiple servers. Any tool you use and can advice of? Thanks -- View this message in context:

solr qtime suddenly increased in production env

2013-08-05 Thread adfel70
I have a solr cluster of 7 shards, replicationFactor 2, running on 7 physical machines. Machine spec: cpu: 16 memory: 32gb storage is on local disks Each machine runs 2 solr processes, each process with 6gb memory to jvm. The cluster currently has 330 million documents, each process around 30gb

Re: solr qtime suddenly increased in production env

2013-08-05 Thread adfel70
Heisey-4 wrote On 8/5/2013 10:17 AM, adfel70 wrote: I have a solr cluster of 7 shards, replicationFactor 2, running on 7 physical machines. Machine spec: cpu: 16 memory: 32gb storage is on local disks Each machine runs 2 solr processes, each process with 6gb memory to jvm. The cluster

Need advice on performing 300 queries per second on solr index

2013-07-16 Thread adfel70
Hi I need to create a solr cluster that contains geospatial information and provides the ability to perform a few hundreds queries per second, each query should retrieve around 100k results. The data is around 100k documents, around 300gb total. I started with 2 shard cluster (replicationFactor

Is it possible to facet on existence of a field?

2013-07-08 Thread adfel70
I have a field that's only indexed in some of the documents. Can I create a boolean facet on this field by its existence? for instance: yes(124) no(479) Note that the fields' value is not facetable because all its values are unique most of the time. I just want to facet on the question whether

Every collection.reload makes zookeeper think shards are down

2013-07-08 Thread adfel70
Hi each time I reload a collection via collections API, zookeeper thinks that all the shards in the collection are down. It marks them as down and I can't send requests. Why thinks? because if I manually edit clusterstate.json file and set 'state' value to 'active', they come back up and

lang.fallback doesn't work when using lang.fallbackFields

2013-07-07 Thread adfel70
Hi I'm trying to index a set of documents with solr's language detection component. I set langid.fallbackFieldsuser_lan/langid.fallbackFields langid.whitelisten,it/langid.whitelist langid.fallbacken/langid.fallback In some documents user_lan has 'sk', solr falls-back to 'sk' ,which is not in the

Why shouldn't lang-id component work at query-time?

2013-07-07 Thread adfel70
Hi, I'm trying to integrate solr's lang-id component in my solr environment. In my scenario, I have documents in many different languages. I want to index them in the same solr collection, to different fields and apply language-specific analyzers on each field by its language. So far lang-id

  1   2   >