Re: Solr 6.4. Can't index MS Visio vsdx files

2017-04-11 Thread Gytis Mikuciunas
when 1.15 will be released? maybe you have some beta version and I could test it :) SAX sounds interesting, and from info that I found in google it could solve my issues. On Tue, Apr 11, 2017 at 10:48 PM, Allison, Timothy B. wrote: > It depends. We've been trying to make

NonRepeatableRequestException Error during indexing after setting up Basic Authentication

2017-04-11 Thread Zheng Lin Edwin Yeo
Hi, I'm getting an error with indexing using SolrJ after setting up the Basic Authentication with the following code. Credentials defaultcreds = new UsernamePasswordCredentials("id", "password"); appendAuthentication(defaultcreds, "BASIC", solr); private static void

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-11 Thread Walter Underwood
JVM version? We’re running v8 update 121 with the G1 collector and it is working really well. We also have an 8GB heap. Graph your heap usage. You’ll see a sawtooth shape, where it grows, then there is a major GC. The maximum of the base of the sawtooth is the working set of heap that your

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-11 Thread Shawn Heisey
On 4/11/2017 2:56 PM, Chetas Joshi wrote: > I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold collection > with number of shards = 80 and replication Factor=2 > > Sold JVM heap size = 20 GB > solr.hdfs.blockcache.enabled = true > solr.hdfs.blockcache.direct.memory.allocation = true >

Re: Deleting a field in schema.xml, reindex needed?

2017-04-11 Thread Shawn Heisey
On 4/11/2017 2:19 PM, Scruggs, Matt wrote: > I’m updating our schema.xml file with 1 change: deleting a field. > > Do I need to re-index all of my documents in Solr, or can I simply reload my > collection config by calling: > >

Re: Expiry of Basic Authentication Plugin

2017-04-11 Thread Zheng Lin Edwin Yeo
Hi Jordi, Thanks for the advice. Regards, Edwin On 11 April 2017 at 18:27, Jordi Domingo Borràs wrote: > Browsers retain basic auth information. You have to close it or clean > browsing history. You can also change the user password at server side. > > Best > > On

Re: Deleting a field in schema.xml, reindex needed?

2017-04-11 Thread Walter Underwood
When I have done this, it is in multiple steps. 1. Change the indexing so that no data is going to that field. 2. Reindex, so the field is empty. 3. Remove the field from the schema. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 11, 2017, at

RE: Deleting a field in schema.xml, reindex needed?

2017-04-11 Thread Markus Jelsma
Hi - We did this on one occasion and Solr started complaining in the logs about a field that is present but not defined. We thought the problem would go away within 30 days - the time within every document is reindexed or deleted - but it did not, for some reason. Forcing a merge did not solve

Deleting a field in schema.xml, reindex needed?

2017-04-11 Thread Scruggs, Matt
I’m updating our schema.xml file with 1 change: deleting a field. Do I need to re-index all of my documents in Solr, or can I simply reload my collection config by calling: http://mysolrhost:8000/solr/admin/collections?action=RELOAD=mycollection Thanks, Matt

RE: CommonGrams

2017-04-11 Thread Markus Jelsma
Hi - i cannot think of any real drawback right away. But you probably can expect a slightly different ordered MLT response. It should not be a problem if you select enough terms for MLT lookup. Regards, Markus -Original message- > From:David Hastings

Re: SolrJ appears to have problems with Docker Toolbox

2017-04-11 Thread Shawn Heisey
On 4/8/2017 6:42 PM, Mike Thomsen wrote: > I'm running two nodes of SolrCloud in Docker on Windows using Docker > Toolbox. The problem I am having is that Docker Toolbox runs inside of a > VM and so it has an internal network inside the VM that is not accessible > to the Docker Toolbox VM's host

Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
Here is a small snippet that I copy pated from Shawn Helsey (who is a core contributor I think, he's good): > One thing to note: SolrCloud begins to have performance issues when the > number of collections in the cloud reaches the low hundreds. It's not > going to scale very well with a

Long GC pauses while reading Solr docs using Cursor approach

2017-04-11 Thread Chetas Joshi
Hello, I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold collection with number of shards = 80 and replication Factor=2 Sold JVM heap size = 20 GB solr.hdfs.blockcache.enabled = true solr.hdfs.blockcache.direct.memory.allocation = true MaxDirectMemorySize = 25 GB I am querying a solr

Re: simple matches not catching at query time

2017-04-11 Thread Mikhail Khludnev
John, Here I mean a query, which matches a doc, which it expected to be matched by the problem query. https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-TheexplainOtherParameter On Tue, Apr 11, 2017 at 11:32 PM, John Blythe wrote:

Re: simple matches not catching at query time

2017-04-11 Thread John Blythe
first off, i don't think i have a full handle on the import of what is outputted by the debugger. that said, if "...PhraseQuery(manufacturer_split_syn:\"vendor vendor\")" is matching against `vendor_coolmed | coolmed | vendor`, then 'vendor' should match. the query analyzer is keywordtokenizer,

CommonGrams

2017-04-11 Thread David Hastings
Hi, was wondering if there are any known drawbacks to using the CommonGram factory, in regards to such features as the "more like this"

Re: simple matches not catching at query time

2017-04-11 Thread Mikhail Khludnev
John, How do you suppose to match any of "parsed_filter_queries":[" MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor) vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")" against vendor_coolmed | coolmed | vendor ? I just can't see any chance to match them. One

RE: Solr 6.4. Can't index MS Visio vsdx files

2017-04-11 Thread Allison, Timothy B.
It depends. We've been trying to make parsers more, erm, flexible, but there are some problems from which we cannot recover. Tl;dr there isn't a short answer. :( My sense is that DIH/ExtractingDocumentHandler is intended to get people up and running with Solr easily but it is not really a

Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
> > And this overhead depends on what? I mean, if I create an empty collection > will it take up much heap size just for "being there" ? Yes. You can search on elastic-search/solr/lucene mailing lists and see that it's true. But nobody has `empty` collections, so yours will have a schema and

Re: simple matches not catching at query time

2017-04-11 Thread John Blythe
hi, erick. appreciate the feedback. 1> i'm sending the terms to solr enquoted 2> i'd thought that at one point and reran the indexing. i _had_ had two of the fields not indexed, but this represented one pass (same analyzer) from two diff source fields while 2 or 3 of the other 4 fields _were_

RE: Solr 6.4. Can't index MS Visio vsdx files

2017-04-11 Thread Gytis Mikuciunas
Thanks for your responses. Are there any posibilities to ignore parsing errors and continue indexing? because now solr/tika stops parsing whole document if it finds any exception On Apr 11, 2017 19:51, "Allison, Timothy B." wrote: > You might want to drop a note to the dev

Re: Dynamic schema memory consumption

2017-04-11 Thread jpereira
The way the data is spread across the cluster is not really uniform. Most of shards have way lower than 50GB; I would say about 15% of the total shards have more than 50GB. Dorian Hoxha wrote > Each shard is a lucene index which has a lot of overhead. And this overhead depends on what? I mean,

Re: Grouped Result sort issue

2017-04-11 Thread Erick Erickson
Skimming, I don't think this is inconsistent. First I assume that you're OK with the second example, it's this one seems odd to you: sort=score asc group.sort=score desc You're telling Solr to return the highest scoring doc in each group. However, you're asking to order the _groups_ in ascending

Re: simple matches not catching at query time

2017-04-11 Thread Erick Erickson
=query is your friend. There are several issues that often trip people up: 1> The analysis tab pre-supposes that what you put in the boxes gets all the way to the field in question. Trivial example: I put (without quotes) "erick erickson" in the "name" field in the analysis page and see that it

RE: Solr 6.4. Can't index MS Visio vsdx files

2017-04-11 Thread Allison, Timothy B.
You might want to drop a note to the dev or user's list on Apache POI. I'm not extremely familiar with the vsd(x) portion of our code base. The first item ("PolylineTo") may be caused by a mismatch btwn your doc and the ooxml spec. The second item appears to be an unsupported feature. The

Re: SolrJ appears to have problems with Docker Toolbox

2017-04-11 Thread Vincenzo D'Amore
Ok :) But if you have time have a look at my project https://github.com/freedev/ solrcloud-zookeeper-docker The project builds a couple of docker instances (solr - zookeeper) or a cluster with 6 nodes. Then you have just to put in your hosts file the ip addresses of your VM and you can play

Invoking a SerachHandler inside Solr Plugin

2017-04-11 Thread Max Bridgewater
I am looking for best practices when a search component in one handler, needs to invoke another handler, say /basic. So far, I got this working prototype: public void process(ResponseBuilder rb) throws IOException { SolrQueryResponse response = new SolrQueryResponse();

Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
What I'm suggesting, is that you should aim for max(50GB) per shard of data. How much is it currently ? Each shard is a lucene index which has a lot of overhead. If you can, try to have 20x-50x-100x less shards than you currently do and you'll see lower heap requirement. I don't know about

Re: Dynamic schema memory consumption

2017-04-11 Thread jpereira
Dorian Hoxha wrote > Isn't 18K lucene-indexes (1 for each shard, not counting the replicas) a > little too much for 3TB of data ? > Something like 0.167GB for each shard ? > Isn't that too much overhead (i've mostly worked with es but still lucene > underneath) ? I don't have only 3TB , I have

Re: SolrJ appears to have problems with Docker Toolbox

2017-04-11 Thread Mike Thomsen
Thanks. I think I'll take a look at that. I decided to just build a big vagrant-managed desktop VM to let me run Ubuntu on my company machine, so I expect that this pain point may be largely gone soon. On Mon, Apr 10, 2017 at 12:31 PM, Vincenzo D'Amore wrote: > Hi Mike > >

simple matches not catching at query time

2017-04-11 Thread John Blythe
hi everyone. i recently wrote in ('analysis matching, query not') but never heard back so wanted to follow up. i'm at my wit's end currently. i have several fields that are showing matches in the analysis tab. when i dumb down the string sent over to query it still gives me issues in some field

Re: Grouped Result sort issue

2017-04-11 Thread Eric Cartman
I modified and cleaned the previous query. As you can see the first query sorting is a bit odd. Using parameters sort=score asc group.sort=score desc http://localhost:8983/solr/mcontent.ph_post/select?==*,score=partnerId=1=false=true=score desc=true=on=text:cars=5000=score

Re: Solr/ Velocity dont show full field value

2017-04-11 Thread Erik Hatcher
#field() is defined in _macros.vm as this monstrosity: # TODO: make this parameterized fully, no context sensitivity #macro(field $f) #if($response.response.highlighting.get($docId).get($f).get(0)) #set($pad = "") #foreach($v in $response.response.highlighting.get($docId).get($f))

Re: Grouped Result sort issue

2017-04-11 Thread Erick Erickson
the group.sort spec is specified twice in the URL group.sort=score desc& group.sort=score desc Is there a chance that during testing you only changed _one_ of them so you had group.sort=score desc& group.sort=score asc ? I think the last one should win.. Shot in the dark. Best, Erick On Tue,

Re: Filtering results by minimum relevancy score

2017-04-11 Thread Dorian Hoxha
Can't the filter be used in cases when you're paginating in sharded-scenario ? So if you do limit=10, offset=10, each shard will return 20 docs ? While if you do limit=10, _score<=last_page.min_score, then each shard will return 10 docs ? (they will still score all docs, but merging will be

Solr/ Velocity dont show full field value

2017-04-11 Thread Hamso
Hey guys, I have a problem: In Velocity: *Beschreibung:*#field('LONG_TEXT') In Solr the field "LONG_TEXT" dont show everything only the first ~90-110 characters. But if I set "$doc.getFieldValue('LONG_TEXT')" in the Velocity file, then he show me everything whats inside in the field

Re: Solr 6.4. Can't index MS Visio vsdx files

2017-04-11 Thread Gytis Mikuciunas
Hi, history: 1. we're using single core Solr 6.4 instance on windows server (windows server 2012 R2 standard), 2. Java v8, (build 1.8.0_121-b13). 3. as a workaround for earlier issues with visio files, we have in solr-6.4.0\contrib\extraction\lib: 3.1. ooxml-schemas-1.3.jar instead of

Re: Filtering results by minimum relevancy score

2017-04-11 Thread alessandro.benedetti
Can i ask what is the final requirement here ? What are you trying to do ? - just display less results ? you can easily do at search client time, cutting after a certain amount - make search faster returning less results ? This is not going to work, as you need to score all of them as Erick

Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

2017-04-11 Thread Toke Eskildsen
On Mon, 2017-04-10 at 13:27 +0530, Himanshu Sachdeva wrote: > Thanks for your time and quick response. As you said, I changed our > logging level from SEVERE to INFO and indeed found the performance > warning *Overlapping onDeckSearchers=2* in the logs. If you only see it occasionally, it is

Re: Expiry of Basic Authentication Plugin

2017-04-11 Thread Jordi Domingo Borràs
Browsers retain basic auth information. You have to close it or clean browsing history. You can also change the user password at server side. Best On Tue, Apr 11, 2017 at 7:18 AM, Zheng Lin Edwin Yeo wrote: > Anyone has any idea if the authentication will expired

Re: Grouped Result sort issue

2017-04-11 Thread alessandro.benedetti
To be fair the second result seems consistent with the Solr grouping logic : *First Query results (Suspicious)* 1) group.sort= score desc -> select the group head as you have 1 doc per group( the head will be the top scoring doc per group) 2) sort=score asc -> sort the groups by the score of the

Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
Also you should change the heap 32GB->30GB so you're guaranteed to get pointer compression. I think you should have no need to increase it more than this, since most things have moved to out-of-heap stuff, like docValues etc. On Tue, Apr 11, 2017 at 12:07 PM, Dorian Hoxha

Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
Isn't 18K lucene-indexes (1 for each shard, not counting the replicas) a little too much for 3TB of data ? Something like 0.167GB for each shard ? Isn't that too much overhead (i've mostly worked with es but still lucene underneath) ? Can't you use 1/100 the current number of collections ? On

Unable to index UIMA field into Solr

2017-04-11 Thread aruninfo100
Hi All, I am trying to integrate UIMA with Solr.I was able to do the same.But some of the UIMA fields are not getting indexed into solr whereas other *fields like pos,ChukType are getting indexed*. I am using openNLP-UIMA together for text analysis. When I tried to index the UIMA field for