Re: Facet full-text

2018-03-06 Thread Shawn Heisey
On 3/6/2018 10:16 AM, Moncif Aidi wrote: I am using Solr to power faceting features for our application. I know that SOLR can do free text search but what is the best practice for faceting on common terms inside SOLR text fields? Based on everything below, there might be a little bit of

Re: Facet full-text

2018-03-06 Thread Emir Arnautović
Hi, Faceting on text field requires use of field cache which can eat up a large heap and result in unstable Solr. It is recommended to have doc values enabled for field that you plan to do faceting on, but you cannot enable doc values on text field. It is recommended to do preprocessing of text

Re: Atomic updates using solr-php-client

2018-03-06 Thread Rick Leir
Sami Why not do the simple case first, with complete document updates. When you have that working, you can decide if you want atomic updates too. Cheers -- Rick On March 6, 2018 2:26:50 AM EST, Sami al Subhi wrote: >Thank you for replying, > >Yes that is the one. Unfortunately

Re: Need a Query syntax for fetching results

2018-03-06 Thread Rick Leir
Hi Raj Maybe this would be what you need. "Keyword Tokenizer This tokenizer treats the entire text field as a single token." There used to be an example showing the use of this in schema.xml, but I am away from my computer so it is hard to check. And everything Emir says is spot-on. Then you

Re: Alias field names when searching (not for results)

2018-03-06 Thread Rick Leir
Christopher The first thing that came to mind is that you are planning not to have an app in front of Solr. Without a web app, you will need to trust whoever can get access to Solr. Maybe you are on an intranet. Thanks -- Rick On March 6, 2018 2:42:26 AM EST, "Emir Arnautović"

Facet full-text

2018-03-06 Thread Moncif Aidi
Hello, I am using Solr to power faceting features for our application. I know that SOLR can do free text search but what is the best practice for faceting on common terms inside SOLR text fields? For example, we have a large blob of text (a description of a property) which contains useful text

Re: Solr Read-Only?

2018-03-06 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Terry, On 3/6/18 4:55 PM, Terry Steichen wrote: > Chris, > > Thanks for your suggestion. Restarting solr after an in-memory > corruption is, of course, trivial (compared to rebuilding the > indexes). > > Are there any solr directories that MUST

Re: Solr Read-Only?

2018-03-06 Thread Terry Steichen
Chris, Thanks for your suggestion.  Restarting solr after an in-memory corruption is, of course, trivial (compared to rebuilding the indexes). Are there any solr directories that MUST be read/write (even with a pre-built index)?  Would it suffice (for my purposes) to make only the data/index

Re: Solr Read-Only?

2018-03-06 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Terry, On 3/6/18 4:08 PM, Terry Steichen wrote: > Is it possible to run solr in a read-only directory? > > I'm running it just fine on a ubuntu server which is accessible > only through SSH tunneling. At the platform level, this is fine: > only

Solr Read-Only?

2018-03-06 Thread Terry Steichen
Is it possible to run solr in a read-only directory? I'm running it just fine on a ubuntu server which is accessible only through SSH tunneling.  At the platform level, this is fine: only authorized users can access it (via a browser on their machine accessing a forwarded port).  The problem is

Re: Solr SynonymGraphFilterFactory error on import

2018-03-06 Thread Webster Homer
You probably want to call solr.FlattenGraphFilterFactory after the call to WordDelimiterGraphFilterFactory. I put it at the end Also there is an issue calling more than one graph filter in an analysis chain so you may need to remove one of them. I think that there is a Jira about that

CDCR Invalid Number on deletes

2018-03-06 Thread Chris Troullis
Hi all, We recently upgraded to Solr 7.2.0 as we saw that there were some CDCR bug fixes and features added that would finally let us be able to make use of it (bi-directional syncing was the big one). The first time we tried to implement we ran into all kinds of errors, but this time we were

LTR not picking up modified features

2018-03-06 Thread Roopa Rao
Hi - There was an error in one of the feature definition in Solr LTR features.json file and I modified and uploaded it to Solr. I can see that the definition change is uploaded correctly using the feature store url such as http://servername/solr/techproducts/schema/feature-store/myFeatureStore

Solr document routing using composite key

2018-03-06 Thread Nawab Zada Asad Iqbal
Hi solr community: I have been thinking to use composite key for my next project iteration and tried it today to see how it distributes the documents. Here is a gist of my code: https://gist.github.com/niqbal/3e293e2bcb800d6912a250d914c9d478 I have 117 shards and i tried to use document ids

Replicate configoverlay.json

2018-03-06 Thread Sundaram, Dinesh
Team, Can you please share the steps to replicate configoverlay.json from Master to Slave... in other words, how do we replicate from Master to Slave if any configuration updated via API. Dinesh Sundaram MBS Platform Engineering Mastercard [cid:image001.png@01D3B541.4529DEF0] CONFIDENTIALITY

Re: Solr 7.2.0 CDCR Issue with TLOG collections

2018-03-06 Thread Webster Homer
seems that this is a bug in Solr https://issues.apache.org/jira/browse/SOLR-12057 Hopefully it can be addressed soon! On Mon, Mar 5, 2018 at 4:14 PM, Webster Homer wrote: > I noticed that the cdcr action=queues returns different results for the > target clouds. One

Re: Nested documents vs. flattening document structure?

2018-03-06 Thread Dc Tech
Thank you Erick. That was my instinct as well. On Tue, Mar 6, 2018 at 10:05 AM, Erick Erickson wrote: > Flattening the nested documents is usually preferred if at all > possible. Nested documents to, indeed, have a series of restrictions > that often make them harder

Re: Solr dih extract text from inline images in pdf

2018-03-06 Thread Erick Erickson
It's often much easier to approach this by running Tika separately. Here's a blog on both the reasoning and sample code: https://lucidworks.com/2012/02/14/indexing-with-solrj/ Among other things, you have a lot more control over how Tika operates. Best, Erick On Tue, Mar 6, 2018 at 12:36 AM,

Re: Copying a SolrCloud collection to other hosts

2018-03-06 Thread Erick Erickson
this is part of the "different replica types" capability, there are NRT (the only type available prior to 7x), PULL and TLOG which would have different names. I don't know of any way to switch it off. As far as moving the data, here's a little known trick: Use the replication API to issue a

Re: Nested documents vs. flattening document structure?

2018-03-06 Thread Erick Erickson
Flattening the nested documents is usually preferred if at all possible. Nested documents to, indeed, have a series of restrictions that often make them harder to work with than flattened docs. Best, Erick On Tue, Mar 6, 2018 at 6:48 AM, Dc Tech wrote: > We are evaluating

Nested documents vs. flattening document structure?

2018-03-06 Thread Dc Tech
We are evaluating using nested documents vs. simply flattening the document. Looking through the documentation, it is not very clear to me if the nested documents are fully mature, and support the full richness of SOLR (streaming, mature faceting) etc... Any opinions or guidance on that? For

Copying a SolrCloud collection to other hosts

2018-03-06 Thread Patrick Schemitz
Hi List, so I'm running a bunch of SolrCloud clusters (each cluster is: 8 shards on 2 servers, with 4 instances per server, no replicas, i.e. 1 shard per instance). Building the index afresh takes 15+ hours, so when I have to deploy a new index, I build it once, on one cluster, and then copy

Analytics componen exception

2018-03-06 Thread solrdj
A would like to use Analytisc component.  I configured it by https://lucene. apache.org/solr/guide/7_2/analytics.html. When I try to send query to solr, exception is thrown. Reason: Server ErrorCaused by:java.lang. IllegalAccessError: tried to access field org.apache.solr.handler.component.

RE: Solr Cloud: query elevation + deduplication?

2018-03-06 Thread Markus Jelsma
Hi, I would not use ID (uniqueKey) as signature field, query elevation would never work properly with such a set up, change a document's content, and it 'll get a new ID. If i remember correctly this factory still deletes duplicates if signatureField is not uniqueKey. Regarding SOLR-3473,

Re: Need a Query syntax for fetching results

2018-03-06 Thread Emir Arnautović
Hi Raj, You need to get familiar with Solr analysis chain: https://lucene.apache.org/solr/guide/6_6/understanding-analyzers-tokenizers-and-filters.html When playing with it, use admin console

Need a Query syntax for fetching results

2018-03-06 Thread Rajvinder Pal
Hi , I am new to Lucene. I have a requirement where when i request the organization name, it should show the matching organization names. I have written the q param as orgname_text: ABC test it is returning the result :- ABC test limited ABC XYZ limited DEF ABC limted test limited I want all

Solr dih extract text from inline images in pdf

2018-03-06 Thread lala
Hi, I am working with solr7, indexing multilingual files existing in a folder, using DIH (FileListEntityProcessor for the basic entity, & TikaEntityProcessor for the child entity in configuration file). My problem relies here: I want to extract texts from images inside PDF files, that works fine