Re: Solr Configuration (Caching & RAM) for performance Tuning

2016-03-02 Thread Erick Erickson
Committing after every doc is an anti-pattern. All the in-memory structures are being thrown away after each update/insert. Why do you think you need to do this? The usual pattern is to just let your autocommit parameters in Solr config.XML do this for you. Ditto with specifying commitWithin on

Re: Currency field doubts

2016-03-02 Thread Jan Høydahl
Hi, In SolrCloud you would want to upload your new currency.xml to ZK and then call the collections API for a reload. Alternatively you could write your own ExchangeRate Provider for Google implementing the interface ExchangeRateProvider. The downside here is that each Solr node then will fetch

Re: Solr Configuration (Caching & RAM) for performance Tuning

2016-03-02 Thread Binoy Dalal
Can you share the cache stats from the admin panel? Also how much load are you talking about here? (Queries/second) How many documents do you have? Are you fetching any large stored fields? On Thu, 3 Mar 2016, 12:31 Maulin Rathod, wrote: > Adding extra information. > > Our

Re: BlockJoinQuery parser and ArrayIndexOutOfBoundException

2016-03-02 Thread Mikhail Khludnev
On Thu, Mar 3, 2016 at 7:18 AM, Sathyakumar Seshachalam < sathyakumar_seshacha...@trimble.com> wrote: > In my case, yes there are standalone docs (without any parents) and then > there is blocks with parents and its children in the same index. > As far as I know you can't mix them. Can you try

RE: Solr Configuration (Caching & RAM) for performance Tuning

2016-03-02 Thread Maulin Rathod
Adding extra information. Our index size is around 120 GB (2 shard + 2 replica). We have 400 GB RAM on our windows server. Solr is assigned 50 GB RAM. So there is huge amount of free RAM (>300 GB) is available for OS. We have very simple query which returns only 5 solr documents. Under load

RE: Solr Configuration (Caching & RAM) for performance Tuning

2016-03-02 Thread Maulin Rathod
we do soft commit when we insert/update document. //Insert Document UpdateResponse resp = cloudServer.add(doc, 1000); if (resp.getStatus() == 0) { success = true; } //Update Document UpdateRequest req = new UpdateRequest(); req.setCommitWithin(1000); req.add(docs); UpdateResponse resp

Re: SolrCloud - Strategy for recovering cluster states

2016-03-02 Thread danny teichthal
According to what you describe, I really don't see the need of core discovery in Solr Cloud. It will only be used to eagerly load a core on startup. If I understand correctly, when ZK = truth, this eager loading can/should be done by consulting zookeeper instead of local disk. I agree that it is

Re: Solr Configuration (Caching & RAM) for performance Tuning

2016-03-02 Thread Binoy Dalal
1) Experiment with the autowarming settings in solrconfig.xml. Since in your case, you're indexing so frequently consider setting the count to a low number, so that not a lot of time is spent warming the caches. Alternatively if you're not very big on initial query response times being small, you

Solr Configuration (Caching & RAM) for performance Tuning

2016-03-02 Thread Maulin Rathod
Hi, We are using Solr 5.2 (on windows 2012 server/jdk 1.8) for document content indexing/querying. We found that querying slows down intermittently under load condition. In our analysis we found two issues. 1) Solr is not effectively using caching. Whenever new document indexed, it opens new

XX:ParGCCardsPerStrideChunk

2016-03-02 Thread William Bell
Has anyone tried -XX:ParGCCardsPerStrideChunk with Solr? There has been reports of improved GC times. -- Bill Bell billnb...@gmail.com cell 720-256-8076

Currency field doubts

2016-03-02 Thread Pranaya Behera
Hi, For currency, as suggested in the wiki and guide, the field type is currency and the defaults would take usd and will take the exchange rates from the currency.xml file located in the conf dir. We have script that talks to google apis for the current currency exchange and symlinked

Re: BlockJoinQuery parser and ArrayIndexOutOfBoundException

2016-03-02 Thread Sathyakumar Seshachalam
Hi, I will try that approach. Deleting and force merging before adding the blocks. In my case, yes there are standalone docs (without any parents) and then there is blocks with parents and its children in the same index. Note however that docs in the blocks are unique in that the children, there

SolrEntityProcessor works with Solr Cloud

2016-03-02 Thread Neeraj Bhatt
Hello All I am tryiing to import data from one solr cloud into another using SolrEntityProcessor. My schema got changed and I need to reindex 1. Does SolrEntityProcessor works with Solr cloud to get data from Solr Cloud ? It looks it will not work as SolrEntityProcessor code is creating

Re: facet on two multi-valued fields

2016-03-02 Thread Jan Høydahl
It makes no sense to facet on a “text_general” ananlyzed field. Can you give a concrete example with a few dummy docs and show some queries (do you query the tagDescription field?) and wanted facet output? There may be several ways to solve the task, depending on the exact use case. One

Re: Solr (v5.3.1) doesn't delete orphaned child documents

2016-03-02 Thread Mikhail Khludnev
when it indexes a document block it have to assign not a but "_root_" field, but deleteById() is unaware of it. On Wed, Mar 2, 2016 at 8:16 PM, Naeem Tahir wrote: > Hi, > > I noticed some strange behavior when deleting orphaned child > documents in Solr 5.3.1. I

Re: SolrCloud - Strategy for recovering cluster states

2016-03-02 Thread Jeff Wartes
Well, with the understanding that someone who isn’t involved in the process is describing something that isn’t built yet... I could imagine changes like: - Core discovery ignores cores that aren’t present in the ZK cluster state - New cores are automatically created to bring a node in line

Solr (v5.3.1) doesn't delete orphaned child documents

2016-03-02 Thread Naeem Tahir
Hi,    I noticed some strange behavior when deleting orphaned child documents in Solr 5.3.1. I am indexing nested documents in parent/child hierarchy. When I delete a child document whose parent is already deleted previously, child document still shows up in search. I am using

Re: FW: Difference Between Tokenizer and filter

2016-03-02 Thread Shawn Heisey
On 3/2/2016 9:55 AM, G, Rajesh wrote: > Thanks for your email Koji. Can you please explain what is the role of > tokenizer and filter so I can understand why I should not have two tokenizer > in index and I should have at least one tokenizer in query? You can't have two tokenizers. It's not

RE: FW: Difference Between Tokenizer and filter

2016-03-02 Thread G, Rajesh
Thanks for your email Koji. Can you please explain what is the role of tokenizer and filter so I can understand why I should not have two tokenizer in index and I should have at least one tokenizer in query? My understanding is tokenizer is used to say how the content should be indexed

facet on two multi-valued fields

2016-03-02 Thread Andreas Hubold
Hi, my schema looks like this multiValued="true"/> stored="false" multiValued="true"/> I'd like to get the tagIds of documents with a certain tagDescription (and text). However tagIds contains multiple ids in the same order as tagDescription and simple faceting would return all. Is there

Re: BlockJoinQuery parser and ArrayIndexOutOfBoundException

2016-03-02 Thread Mikhail Khludnev
Hello, It's really hard to find exact case, why it happens. There is a bruteforce approach, sweep all deleted documents ie forcemerge until there is no deleted docs. Can it happen that standalone docs and parent blocks are mixed in the index? On Wed, Mar 2, 2016 at 2:04 PM, Sathyakumar

Re: Non-contigous terms in SuggestComponent

2016-03-02 Thread Alfonso Muñoz-Pomer Fuentes
Hi Edwin. That was what I suspected, but I wanted to confirm. If we go down this route I’ll do some testing and post the results. We’re using 5.1 in production, but I’m testing with 5.4.1. The index has 40,891,287 documents and is 3.01 GB, so it’s not big at all. Many thanks, Alfonso On

Re: FW: Difference Between Tokenizer and filter

2016-03-02 Thread Koji Sekiguchi
Hi, ... must have one and only one and it can have zero or more s. From the point of view of the rules, your ... is not correct because it has more than one and ... is not correct as well because it has no . Koji On 2016/03/02 20:25, G, Rajesh wrote: Hi Team, Can you please clarify the

Query solrcloud questions

2016-03-02 Thread michael solomon
Hi, I Installed 3 instances of SolrCloud 5.4.1. I'm doing a little search engine of websites and I'm store their info as Nested Documents(one document for the website general information and it children is the pages inside the website). So when I'm querying this collection I'm using a BlockJoin

Re: Commit after every document - alternate approach

2016-03-02 Thread Varun Thacker
Hi Sangeetha, Well I don't think you need to commit after every document add. You can rely on Solr's transaction log feature . If you are using SolrCloud it's mandatory to have a transaction log . So every documents get written to the tlog . Now say a node crashes even if documents were not

BlockJoinQuery parser and ArrayIndexOutOfBoundException

2016-03-02 Thread Sathyakumar Seshachalam
Am running in to this issue : https://issues.apache.org/jira/browse/SOLR-7606. But am not following all of the description there in that ticket. But what I am not able to understand is when does a parent/child orthogonality is broken. And what does a child document without a parent mean ? I

RE: outlook email file pst extraction problem

2016-03-02 Thread Allison, Timothy B.
This is probably more of a Tika question now... It sounds like Tika is not extracting dates from the .eml files that you are generating? To confirm, you are able to extract dates with libpst...it is just that Tika is not able to process the dates that you are sending it in your .eml files?

Re: [ISSUE] After restoring data to a Solrcloud instance

2016-03-02 Thread Varun Thacker
Could you post the full output of the CheckIndex command on the restored snapshot? Also what happens if you delete the snapshot indexes and attempt to restore again? Does it get corrupted again or is it a one off scenario? On Wed, Mar 2, 2016 at 3:44 PM, Janit Anjaria (Tech-IT) <

FW: Difference Between Tokenizer and filter

2016-03-02 Thread G, Rajesh
Hi Team, Can you please clarify the below. My understanding is tokenizer is used to say how the content should be indexed physically in file system. Filters are used to query result. The blow lines are from my setup. But I have seen eg that include filters inside and tokenizer in that

Re: [ISSUE] After restoring data to a Solrcloud instance

2016-03-02 Thread Janit Anjaria (Tech-IT)
Hi, Varun, we actually ran the test for our restored data snapshot and it threw an error saying "Broken segment". How is it possible that the same test gives success on the snapshot, but not on the restored snapshot? Can you please throw some light on this, so we can proceed and fix this issue.

Re: understand scoring

2016-03-02 Thread michael solomon
Hi Emir, In morning I delete those documents and know added them again to re-run the query.. and know this is how I expect (0_0) and I can't to re-produce the problem... this weird.. :\ On Wed, Mar 2, 2016 at 11:38 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Hi Michael, > Can

Re: Commit after every document - alternate approach

2016-03-02 Thread Emir Arnautovic
Hi Sangeetha, What is sure is that it is not going to work - with 200-300K doc/hour, there will be >50 commits/second, meaning there are <20ms time for doc+commit. You can do is let Solr handle commits and maybe use real time get to verify doc is in Solr or do some periodic sanity checks. Are

Re: understand scoring

2016-03-02 Thread Emir Arnautovic
Hi Michael, Can you please run query with debug and share title field configuration. Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 02.03.2016 09:14, michael solomon wrote: Thanks you, @Doug

Re: FW: Difference Between Tokenizer and filter

2016-03-02 Thread Emir Arnautovic
Hi Rajesh, Processing flow is same for both indexing and querying. What is compared at the end are resulting tokens. In general flow is: text -> char filter -> filtered text -> tokenizer -> tokens -> filter1 -> tokens ... -> filterN -> tokens. You can read more about analysis chain in Solr

FW: Difference Between Tokenizer and filter

2016-03-02 Thread G, Rajesh
Hi Team, Can you please clarify the below. My understanding is tokenizer is used to say how the content should be indexed physically in file system. Filters are used to query result. The blow lines are from my setup. But I have seen eg that include filters inside and tokenizer in that

Re: Indexing books, chapters and pages

2016-03-02 Thread Zaccheo Bagnati
If someone of you cares about his Stackoverflow reputation and has time to do it I also opened a question there: http://stackoverflow.com/questions/35722672/solr-schema-to-model-books-chapters-and-pages. Thanks again to everybody Il giorno mer 2 mar 2016 alle ore 09:42 Zaccheo Bagnati

Re: Indexing books, chapters and pages

2016-03-02 Thread Zaccheo Bagnati
Thanks Alexandre, your solution seems very good: I'll surely try it and let you know. I like the Idea of mixing blockjoins and grouping! Il giorno mer 2 mar 2016 alle ore 04:46 Alexandre Rafalovitch < arafa...@gmail.com> ha scritto: > Here is an - untested - possible approach. I might be missing

Re: Indexing books, chapters and pages

2016-03-02 Thread Zaccheo Bagnati
Thanks Jack, the chapter is definitely the optimal unit to search into and your solution seems a quite good approach. The counterpart is that, depending on how we'll choose the amount of text shared on two adjacent pages we will experience some errors. For example will be always possible finding a

Standard highlighting doesn't work for Block Join

2016-03-02 Thread michael solomon
IT WAS MY FIRST POST IN MAILING LIST SO NOT SURE IF YOU GET IT SO I'M SEND IT AGAIN Hi, I have solr 5.4.1 and I'm trying to use Block Join Query Parser for search in children and return the parent. I want to apply highlight on children but it's return empty. My q parameter: "q={!parent

Re: Indexing books, chapters and pages

2016-03-02 Thread Zaccheo Bagnati
Thanks Emir, a similar solution has already come in my mind too: searching on chapters, highlighting the result and retrieve matching pages parsing the highlighted result... surely not a very efficient approach but could work... however I think I'll try different approaches before this Il giorno

Re: understand scoring

2016-03-02 Thread michael solomon
Thanks you, @Doug Turnbull I tried http://splainer.io but it's not for my query(not explain for the docs..). here the picture again... https://drive.google.com/file/d/0B-7dnH4rlntJc2ZWdmxMS3RDMGc/view?usp=sharing On Tue, Mar 1, 2016 at 10:06 PM, Doug Turnbull <

Re: SolrCloud - Strategy for recovering cluster states

2016-03-02 Thread danny teichthal
Thanks Jeff, I understand your philosophy and it sounds correct. Since we had many problems with zookeeper when switching to Solr Cloud. we couldn't make it as a source of knowledge and had to relay on a more stable source. The issues is that when we get such an event of zookeeper, it brought our

Re: Indexing books, chapters and pages

2016-03-02 Thread Zaccheo Bagnati
Thanks Walter, the payload idea is something that I've never heard... it seems interesting but quite complex to implement. I think we'll have to write a custom filter to add page numbers and it's not clear to me how to retrieve payloads in the query result. However I'll try to go more in deep on

Commit after every document - alternate approach

2016-03-02 Thread sangeetha.subraman...@gtnexus.com
Hi All, I am trying to understand on how we can have commit issued to solr while indexing documents. Around 200K to 300K document/per hour with an avg size of 10 KB size each will be getting into SOLR . JAVA code fetches the document from MQ and streamlines it to SOLR. The problem is the