Re: Writing config directly to zookeeper

2018-04-17 Thread Shawn Heisey
On 4/17/2018 8:54 PM, Aristedes Maniatis wrote: Is there any difference between using the tools supplied with Solr to write configuration to Zookeeper or just writing directly to our Zookeeper cluster? We have tooling that makes it much easier to write directly to ZK rather than having to

Re: solr 6.6.3 intermittent group faceting errors(Lucene54DocValuesProducer)

2018-04-17 Thread Shawn Heisey
On 4/17/2018 8:44 PM, Erick Erickson wrote: The other possibility is that you have LuceneMatchVersion set to 5-something in solrconfig.xml. It's my understanding that luceneMatchVersion does NOT affect index format in any way, that about the only things that pay attention to this value are a

Re: solr 6.6.3 intermittent group faceting errors(Lucene54DocValuesProducer)

2018-04-17 Thread Shawn Heisey
On 4/17/2018 12:17 PM, Jay Potharaju wrote: > After digging into the error a bit more ..I see that the error messages > contain a call to lucenecodec54. I am using version solr 6.6.3. Any ideas > why is lucene54 being referred here?? The 6.6 version uses index file formats that were last updated

Writing config directly to zookeeper

2018-04-17 Thread Aristedes Maniatis
Is there any difference between using the tools supplied with Solr to write configuration to Zookeeper or just writing directly to our Zookeeper cluster? We have tooling that makes it much easier to write directly to ZK rather than having to use yet another tool to do it. Thanks Ari

Re: solr 6.6.3 intermittent group faceting errors(Lucene54DocValuesProducer)

2018-04-17 Thread Erick Erickson
Those codecs only change their number when their behavior changes IIUC. So lucenecodec54 may be there for Lucene50StoredFieldsFormat still exists in master/8.0 IOW this is normal. The other possibility is that you have LuceneMatchVersion set to 5-something in solrconfig.xml. Best, Erick On

Re: Specialized Solr Application

2018-04-17 Thread Erick Erickson
Terry: Tika has a horrible problem to deal with and it's approaching a miracle that it does so well ;) Let's take a PDF file. Which vendor's version? From what _decade_? Did that vendor adhere to the spec? Every spec has gray areas so even good-faith efforts can result in some version/vendor

Requires subscription

2018-04-17 Thread Toru Nagai

Re: Solr OpenNLP named entity extraction

2018-04-17 Thread Steve Rowe
Hi Alexey, First, thanks for moving the conversation to the mailing list. Discussion of usage problems should take place here rather than in JIRA. I locally set up Solr 7.3 similarly to you and was able to get things to work. Problems with your setup: 1. Your update chain is missing the Log

Re: Specialized Solr Application

2018-04-17 Thread Terry Steichen
Hi Timothy, As I understand it, Tika is integrated with Solr.  All my indexed documents declare that they've been parsed by tika.  For the eml files it's: |org.apache.tika.parser.mail.RFC822Parser   Word docs show they were parsed by ||org.apache.tika.parser.microsoft.ooxml.OOXMLParser  PDF files

Re: custom response writer which extends RawResponseWriter fails when shards > 1

2018-04-17 Thread Mikhail Khludnev
In distributed search response writer is used twice https://lucene.apache.org/solr/guide/7_1/distributed-requests.html once slave node that's where response writer yields "json" content and it upset aggregator node which is expect only javabin. I hardly can comment on rrw, it's probably used for

Re: CdcrReplicator Forwarder not working on some shards

2018-04-17 Thread Susheel Kumar
Hi Amrit, The cdcr?action=ERRORS is returning consecutiveErrors=1 on the shards which are not forwarding updates. Any clue does that gives? 1 1 0 bad_request On Tue, Apr 17, 2018 at 1:22 PM, Amrit Sarkar wrote: > Susheel, > > At the time of core reload, logs

Infostream question

2018-04-17 Thread Yunee Lee
Hi, Current solr server is 5.2 and I want to enable infoStream and updated the solrconfig.xml. Reload the config. But it doesn’t create any logs. Do I need to configure anything else? Thanks. true

Re: Schemaless mode question

2018-04-17 Thread Kojo
Shawn, I first deleted the collection from the admin interface. It didn´t work. When I deleted direct on command line it worked: /opt/solr-6.6.2/bin/solr delete -c Thanks for the advice on using schemaless on production. I understand the potential problems, so I will first create schema

Re: Weird transaction log behavior with CDCR

2018-04-17 Thread Amrit Sarkar
Chris, Try to index few dummy documents and analyse if the tlogs are getting cleared or not. Ideally on the restart, it clears everything and keeps max 2 tlog per data folder. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks

Re: Weird transaction log behavior with CDCR

2018-04-17 Thread Chris Troullis
Hi Amrit, thanks for the reply. I shut down all of the nodes on the source cluster after the buffer was disabled, and there was no change to the tlogs. On Tue, Apr 17, 2018 at 12:20 PM, Amrit Sarkar wrote: > Chris, > > After disabling the buffer on source, kind shut

Re: solr 6.6.3 intermittent group faceting errors(Lucene54DocValuesProducer)

2018-04-17 Thread Jay Potharaju
After digging into the error a bit more ..I see that the error messages contain a call to lucenecodec54. I am using version solr 6.6.3. Any ideas why is lucene54 being referred here?? Thanks at org.apache.solr.request.SimpleFacets.lambda$getFacetFieldCounts$0(SimpleFacets.java:809)

Re: Solr OpenNLP named entity extraction

2018-04-17 Thread David Hastings
Did you send a commit after you sent the document? On Tue, Apr 17, 2018 at 8:23 AM, Alexey Ponomarenko wrote: > Hi once more I am trying to implement named entities extraction using this > manual > https://lucene.apache.org/solr/7_3_0//solr-analysis- >

Re: Error configuring Spell Checker

2018-04-17 Thread Gene LeFave
James, That was it! Many, many thanks! Gene On Tue, Apr 17, 2018 at 8:57 AM, Dyer, James wrote: > (moving to solr-user@lucene.apache.org) > > Gene, > > I can reproduce your problem if I misspell the "spellcheck.dictionary" > parameter in my query. But I see

Solr OpenNLP named entity extraction

2018-04-17 Thread Alexey Ponomarenko
Hi once more I am trying to implement named entities extraction using this manual https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html I am modified solrconfig.xml like this:

Re: CdcrReplicator Forwarder not working on some shards

2018-04-17 Thread Amrit Sarkar
Susheel, At the time of core reload, logs must be complaining or atleast pointing to some direction. Each leader of shard is responsible to spawn a threadpool for cdcr replicator to get the data over. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter

Re: Learning to Rank (LTR) with grouping

2018-04-17 Thread Alessandro Benedetti
Thanks for the response Shawn ! In relation to this : "I feel fairly sure that most of them are unwilling to document their skills. If information like that is documented, it might saddle a committer with an obligation to work on issues affecting those areas when they may not have the free

Re: Weird transaction log behavior with CDCR

2018-04-17 Thread Amrit Sarkar
Chris, After disabling the buffer on source, kind shut down all the nodes of source cluster first and then start them again. The tlogs will be removed accordingly. BTW CDCR doesn't abide by 100 numRecordsToKeep or 10 numTlogs. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269

Re: custom response writer which extends RawResponseWriter fails when shards > 1

2018-04-17 Thread Lee Carroll
Ok. My expectation was the response writer would not be used until the final serialization of the result. If my response writer breaks the response writer contract, exactly the way rawResponseWriter does and just out puts a filed value how does that work? Does rawResponseWriter support cloud mode?

CdcrReplicator Forwarder not working on some shards

2018-04-17 Thread Susheel Kumar
Hi, Has anyone gone thru this issue where few shard leaders are forwarding updates to their counterpart leaders in target cluster while some of the shards leaders are not forwarding the updates. on Solr 6.6, 4 of the shards logs I see below entries and their counterpart in target are getting

Re: Weird transaction log behavior with CDCR

2018-04-17 Thread Susheel Kumar
DISABLEBUFFER on source cluster would solve this problem. On Tue, Apr 17, 2018 at 9:29 AM, Chris Troullis wrote: > Hi, > > We are attempting to use CDCR with solr 7.2.1 and are experiencing odd > behavior with transaction logs. My understanding is that by default, solr >

Re: solr 6.6.3 intermittent group faceting errors

2018-04-17 Thread Jay Potharaju
Hi Has anyone seen issues with group faceting on multivalued fields in solr 6x? Can any of the committers comment? Thanks Jay > On Apr 16, 2018, at 1:44 PM, Jay Potharaju wrote: > > I deleted my collection and rebuilt it to check if there are any issues with >

Re: Performance & CPU Usage of 6.2.1 vs 6.5.1 & above

2018-04-17 Thread Deepak Goel
Please post the exact results. Many a times the high cpu utilisation may be a boon as it improves query response times On Tue, 17 Apr 2018, 13:55 mganeshs, wrote: > Regarding query times, we couldn't see big improvements. Both are more or > less same. > > Our main worry is

Re: custom response writer which extends RawResponseWriter fails when shards > 1

2018-04-17 Thread Mikhail Khludnev
That's what should happen. Expected mime type application/octet-stream but got application/json. Distributed search coordinator expect to merge slave responses in javabin format. But slave's wt indicated json. As far as I know only javabin might be used to distributed search underneath.

Re: Schemaless mode question

2018-04-17 Thread Shawn Heisey
On 4/17/2018 8:15 AM, Kojo wrote: > I am trying schemaless mode and it seems to works very nice, and there is > no overhead to write a custom schema for each type of collection that we > need to index. > However we are facing a strange problem. Once we have created a collection > and indexed data

Re: Schemaless mode question

2018-04-17 Thread Kojo
I have just deleted using command line and worked as expected! 2018-04-17 11:15 GMT-03:00 Kojo : > Hi all, > > I am trying schemaless mode and it seems to works very nice, and there is > no overhead to write a custom schema for each type of collection that we > need to

Schemaless mode question

2018-04-17 Thread Kojo
Hi all, I am trying schemaless mode and it seems to works very nice, and there is no overhead to write a custom schema for each type of collection that we need to index. However we are facing a strange problem. Once we have created a collection and indexed data on that collection, if we need to

RE: Error configuring Spell Checker

2018-04-17 Thread Dyer, James
(moving to solr-user@lucene.apache.org) Gene, I can reproduce your problem if I misspell the "spellcheck.dictionary" parameter in my query. But I see your query has "direct" which matches the "name" element of one of your spellcheckers. I think the actual problem in your case might be that

RE: Specialized Solr Application

2018-04-17 Thread Allison, Timothy B.
+1 to Charlie's guidance. And... >60,000 documents, mostly pdfs and emails. > However, there's a premium on precision (and recall) in searches. Please, oh, please, no matter what you're using for content/text extraction and/or OCR, run tika-eval[1] on the output to ensure that that you are

Weird transaction log behavior with CDCR

2018-04-17 Thread Chris Troullis
Hi, We are attempting to use CDCR with solr 7.2.1 and are experiencing odd behavior with transaction logs. My understanding is that by default, solr will keep a maximum of 10 tlog files or 100 records in the tlogs. I assume that with CDCR, the records will not be removed from the tlogs until it

Re: custom response writer which extends RawResponseWriter fails when shards > 1

2018-04-17 Thread Lee Carroll
Sure with 1 shard 1 replica this request works fine 1. Request URL: http://localhost:8983/solr/images/image?q=id:1 2. Request Method: GET 3. Status Code: 200 OK logs are clean with 2 shards 2 replicas the same request fails and in the logs INFO - 2018-04-17 13:20:32.052;

Re: Data import batch mode for delta

2018-04-17 Thread Shawn Heisey
On 4/16/2018 7:32 PM, gadelkareem wrote: I cannot complain cuz it actually worked well for me so far but.. I still do not understand if Solr already paginates the results from the full import, why not do the same for the delta. It is almost the same query: `select id from t where t.lastmod >

Re: Learning to Rank (LTR) with grouping

2018-04-17 Thread Shawn Heisey
On 4/17/2018 5:35 AM, Alessandro Benedetti wrote: Apache Lucene/Solr is a big project, is there anywhere in the official Apache Lucene/Solr website where each committer list the modules of interest/expertise ? No, there is no repository like that.  Each committer knows what their own

Re: Learning to Rank (LTR) with grouping

2018-04-17 Thread Alessandro Benedetti
Hi Erick, I have a curiosity/suggestion regarding how to speed up pending( or forgotten ) Jiras, is there a way to find out the most suitable committer(s) for the task and tag them ? Apache Lucene/Solr is a big project, is there anywhere in the official Apache Lucene/Solr website where each

Re: Sorting using "packed" fields?

2018-04-17 Thread Alessandro Benedetti
Hi Christopher, if you model your documents with a nested document approach ( like the one you mentioned) you should be able to achieve your requirement following this interesting blog [1] : *" ToParentBlockJoinQuery supports several score calculation modes. For example, a score for a parent

Re: Performance & CPU Usage of 6.2.1 vs 6.5.1 & above

2018-04-17 Thread mganeshs
Regarding query times, we couldn't see big improvements. Both are more or less same. Our main worry is that, why CPU usage is so high in 6.5.1 and above ? What's going wrong ? Is any one else facing this sort of issue ? If yes, how to bring down the CPU usage? Is there any settings which we

Solr sort on latest upcoming timestamp value on multivalued field

2018-04-17 Thread sayantan94
I have a multivalued field for session timings (where i store timestamps) of groups document. e.g. session_timings: [1526882026, 1513882026, 1533882026 ]. My sorting logic is the groups should be listed sorted based on their upcoming session time. For example, Group A has three session_timings =

Re: Specialized Solr Application

2018-04-17 Thread Charlie Hull
On 16/04/2018 19:48, Terry Steichen wrote: I have from time-to-time posted questions to this list (and received very prompt and helpful responses).  But it seems that many of you are operating in a very different space from me.  The problems (and lessons-learned) which I encounter are often very