Re: Solr still gives old data while faceting from the deleted documents

2018-04-12 Thread girish.vignesh
mincount will fix this issue for sure. I have tried that but the requirement is to show facets with 0 count as disabled. I think I left with only 2 options. Either go with expungeDelets with update URL or use optimize in a scheduler. Regards, Vignesh -- Sent from:

Re: Solr 7.2

2018-04-12 Thread Shawn Heisey
On 4/12/2018 9:48 PM, Antony A wrote: Thank you. I was trying to create the collection using the API. Unfortunately the API changes a bit between 6x to 7x. I posted the API that I used to create the collection and subsequently when trying to create cores for the same collection.

Re: Solr 7.2

2018-04-12 Thread Antony A
Hi Edwin, Thank you. I was trying to create the collection using the API. Unfortunately the API changes a bit between 6x to 7x. I posted the API that I used to create the collection and subsequently when trying to create cores for the same collection. https://pastebin.com/hrydZktX Hopefully

Re: Solr 7.2

2018-04-12 Thread Zheng Lin Edwin Yeo
Hi, I can't really catch what is the issue you are facing. Regards, Edwin On 13 April 2018 at 04:06, Antony A wrote: > Hi, > > I am trying to add a replica to the ssl-enabled solr cloud with external > zookeeper ensemble. > > 2018-04-12 18:26:29.140 INFO

How to index and search (integer or float) vector.

2018-04-12 Thread Jason
Hi,I have specific documents that consist of integer vector with fixed length.But I have no idea how to index integer vector and search similar vector.Which fieldType should I use to solve this problem?And can I get any example for how to search? -- Sent from:

Solr 7.2

2018-04-12 Thread Antony A
Hi, I am trying to add a replica to the ssl-enabled solr cloud with external zookeeper ensemble. 2018-04-12 18:26:29.140 INFO (qtp672320506-51) [ ] o.a.s.h.a.CollectionsHandler Invoked Collection Action :addreplica with params node=_solr=ADDREPLICA=collection_name=shard1 and

Re: DIH with huge data

2018-04-12 Thread Sujay Bawaskar
That sounds good option. So spark job will connect to MySQL and create solr document which is pushed into solr using solrj probably in batches. On Thu, Apr 12, 2018 at 10:48 PM, Rahul Singh wrote: > If you want speed, Spark is the fastest easiest way. You can

Re: DIH with huge data

2018-04-12 Thread Rahul Singh
CSV -> Spark -> SolR https://github.com/lucidworks/spark-solr/blob/master/docs/examples/csv.adoc If speed is not an issue there are other methods. Spring Batch / Spring Data might have all the tools you need to get speed without Spark. -- Rahul Singh rahul.si...@anant.us Anant Corporation

Re: DIH with huge data

2018-04-12 Thread Rahul Singh
If you want speed, Spark is the fastest easiest way. You can connect to relational tables directly and import or export to CSV / JSON and import from a distributed filesystem like S3 or HDFS. Combining a dfs with spark and a highly available SolR - you are maximizing all threads. -- Rahul

Re: PreAnalyzed URP and SchemaRequest API

2018-04-12 Thread David Smiley
Ah ok. I've wondered how much value there is in pre-analysis. The serialization of the analyzed form in JSON is bulky. If you can share any results, I'd be interested to hear how it went. It's an optimization so you should be able to know how much better it is. Of course it isn't for everybody

Re: DIH with huge data

2018-04-12 Thread Sujay Bawaskar
Thanks Rahul. Data source is JdbcDataSource with MySQL database. Data size is around 100GB. I am not much familiar with spark but are you suggesting that we should create document by merging distinct RDBMS tables in using RDD? On Thu, Apr 12, 2018 at 10:06 PM, Rahul Singh

Re: DIH with huge data

2018-04-12 Thread Rahul Singh
How much data and what is the database source? Spark is probably the fastest way. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Apr 12, 2018, 7:28 AM -0400, Sujay Bawaskar , wrote: > Hi, > > We are using DIH with SortedMapBackedCache but as data size

RE: How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?

2018-04-12 Thread Allison, Timothy B.
There's also, of course, tika-server.  No matter the method, it is always best to isolate Tika to its own jvm, vm or m. -Original Message- From: Charlie Hull [mailto:char...@flax.co.uk] Sent: Monday, April 9, 2018 4:15 PM To: solr-user@lucene.apache.org Subject: Re: How to use Tika

Re: How many SynonymGraphFilterFactory can I have?

2018-04-12 Thread Shawn Heisey
On 4/12/2018 6:46 AM, Vincenzo D'Amore wrote: Thanks Shawn, synonyms right now are just organized in categories with different meanings. Thanks a lot for the response. I think this behaviour should be clearly stated in the documentation. Can I access to solr guide and add few notes on this?

Re: ant eclipse on branch_6_4

2018-04-12 Thread Steve Rowe
Hi, You probably have a stale Ivy lock file in ~/.ivy2/cache/, very likely orphaned as a result of manually interrupting the Lucene/Solr build, e.g. via Ctrl-C. You can find it via e.g.: find ~/.ivy2/cache -name ‘*.lck’ Once you have found the stale lock file, manually deleting it should allow

Re: Decision on Number of shards and collection

2018-04-12 Thread Shawn Heisey
On 4/12/2018 4:57 AM, neotorand wrote: I read from the link you shared that "Shard cannot contain more than 2 billion documents since Lucene is using integer for internal IDs." In which java class of SOLR implimentaion repository this can be found. The 2 billion limit  is a *hard* limit from

Re: Solr still gives old data while faceting from the deleted documents

2018-04-12 Thread Shawn Heisey
On 4/12/2018 5:53 AM, girish.vignesh wrote: Solr gives old data while faceting from old deleted or updated documents. For example we are doing faceting on name. name changes frequently for our application. When we index the document after changing the name we get both old name and new name in

ant eclipse on branch_6_4

2018-04-12 Thread rgummadi
I cloned lucene-solr git and working on git branch branch_6_4. I am trying to make this eclipse compatible. So I "ant eclipse" from the root folder. I am getting the below error. Can some one suggest a resolution. [ivy:retrieve] ::

Re: How many SynonymGraphFilterFactory can I have?

2018-04-12 Thread Vincenzo D'Amore
Thanks Shawn, synonyms right now are just organized in categories with different meanings. Thanks a lot for the response. I think this behaviour should be clearly stated in the documentation. Can I access to solr guide and add few notes on this? On Thu, Apr 12, 2018 at 11:40 AM, Shawn Heisey

Solr still gives old data while faceting from the deleted documents

2018-04-12 Thread girish.vignesh
Solr gives old data while faceting from old deleted or updated documents. For example we are doing faceting on name. name changes frequently for our application. When we index the document after changing the name we get both old name and new name in the search results. After digging more on this

Solr still gives old data while faceting from the deleted documents

2018-04-12 Thread girish.vignesh
Solr gives old data while faceting from old deleted or updated documents. For example we are doing faceting on name. name changes frequently for our application. When we index the document after changing the name we get both old name and new name in the search results. After digging more on this

DIH with huge data

2018-04-12 Thread Sujay Bawaskar
Hi, We are using DIH with SortedMapBackedCache but as data size increases we need to provide more heap memory to solr JVM. Can we use multiple CSV file instead of database queries and later data in CSV files can be joined using zipper? So bottom line is to create CSV files for each of entity in

Re: in-place updates

2018-04-12 Thread Hendrik Haddorp
ah, right, sorry On 11.04.2018 17:38, Emir Arnautović wrote: Hi Hendrik, Documentation clearly states conditions when in-place updates are possible: https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates

Re: Decision on Number of shards and collection

2018-04-12 Thread neotorand
Emir I read from the link you shared that "Shard cannot contain more than 2 billion documents since Lucene is using integer for internal IDs." In which java class of SOLR implimentaion repository this can be found. Regards Neo -- Sent from:

Re: How many SynonymGraphFilterFactory can I have?

2018-04-12 Thread Shawn Heisey
On 4/12/2018 3:11 AM, Vincenzo D'Amore wrote: Hi all, anyone could at least point me some good resource that explain how to configure filters in fieldType building? Just understand if exist a document that explain the changes introduced with SynonymGraphFilter or in general what kind of filters

Re: Filter query question

2018-04-12 Thread Shawn Heisey
On 4/12/2018 1:46 AM, LOPEZ-CORTES Mariano-ext wrote: In our search application we have one facet filter (Status) Each status value corresponds to multiple values in the Solr database Example : Status : Initialized --> status in solr = 11I, 12I, 13I, 14I, ... On status value click, search is

Re: How many SynonymGraphFilterFactory can I have?

2018-04-12 Thread Vincenzo D'Amore
Hi all, anyone could at least point me some good resource that explain how to configure filters in fieldType building? Just understand if exist a document that explain the changes introduced with SynonymGraphFilter or in general what kind of filters are compatible and can stay together in the

Re: Filter query question

2018-04-12 Thread Emir Arnautović
Hi, What is the number of these status indicators? It is expected to have slower query if you have more clauses since Solr/Lucene has to load postings for each term and then OR them. The real question is why it is constantly slow since you are using fq and it should be cached. Did you disable

Filter query question

2018-04-12 Thread LOPEZ-CORTES Mariano-ext
Hi In our search application we have one facet filter (Status) Each status value corresponds to multiple values in the Solr database Example : Status : Initialized --> status in solr = 11I, 12I, 13I, 14I, ... On status value click, search is re-fired with fq filter: fq: status:(11I OR 12I OR

Re: Decision on Number of shards and collection

2018-04-12 Thread neotorand
Thanks every one for your beautifull explanation and valuable time. Thanks Emir for the Nice Link(http://www.od-bits.com/2018/01/solrelasticsearch-capacity-planning.html) Thanks Shawn for https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ When