Please help on pdate type during indexing

2019-06-02 Thread derrick cui
Hi all, I spent whole day to indexing my data to solr(8.0), but there is one field which type is pdate always failed.  error adding field 'UpdateDate'='org.apache.solr.common.SolrInputField:UpdateDate=2019-06-03T05:22:14.842Z' msg=Invalid Date in Date Math

Re: Solr Heap Usage

2019-06-02 Thread Erick Erickson
> I've looked through SolrJ, DIH and others -- is the bottomline > across all of them to "batch updates" and not commit as long as possible? Of course it’s more complicated than that ;)…. But to start, yes, I urge you to batch. Here’s some stats:

Adding Multiple JSON Documents

2019-06-02 Thread John Davis
Hi there, I was looking at the solr documentation for indexing multiple documents via json and noticed inconsistency in the docs. Should the POST url be /update/*json/docs *instead of just /update. It does look like former does work, unless both will work just fine?

Re: Solr Heap Usage

2019-06-02 Thread John Davis
If we assume there is no query load then effectively this boils down to most effective way for adding a large number of documents to the solr index. I've looked through SolrJ, DIH and others -- is the bottomline across all of them to "batch updates" and not commit as long as possible? On Sun, Jun

Re: Using Solr as a Database?

2019-06-02 Thread Erick Erickson
Not exactly. If I’m reading this right, you do now, and will continue, to have all the data in the RDBMS, correct? That’s what I call the “system of record”. So you’re not talking about getting rid of the RDBMS, rather basically copying it all over in to Solr and periodically updating your Solr

Re: Using Solr as a Database?

2019-06-02 Thread Dave
You *can use solr as a database, in the same sense that you *can use a chainsaw to remodel your bathroom. Is it the right tool for the job? No. Can you make it work? Yes. As for HA and cluster rdbms gallera cluster works great for Maria db, and is acid compliant. I’m sure any other database

Re: Using Solr as a Database?

2019-06-02 Thread Walter Underwood
> On Jun 2, 2019, at 6:28 AM, Ralph Soika wrote: > > Now as far as I understand is solr a cluster enabled datastore which can be > used to store also all the data form our document. That understanding is incorrect. Solr is not a data store. Reasoning based on that false assumption leads to

Re: Using Solr as a Database?

2019-06-02 Thread Ralph Soika
Thanks Jörn and Erick for your explanations. What I do so far is the following:  * I have a RDBMS with one totally flatten table holding all the data and the id.  * The data is unstructured. Fields can vary from document to document. I have no fixed schema. A dataset is represented by a

Re: Number of threads for addDocument in parallel way

2019-06-02 Thread Erick Erickson
90+% of the time when I see this question it’s a problem with the client not being able to push docs at Solr fast enough. This is particularly true if databases are involved. In addition to Jörn’s comment, I’d ask whether your Solr CPUs are running flat out. If your CPUs aren’t maxed out, you

Re: Solr Heap Usage

2019-06-02 Thread Erick Erickson
Oh, there are about a zillion reasons ;). First of all, most tools that show heap usage also count uncollected garbage. So your 10G could actually be much less “live” data. Quick way to test is to attach jconsole to the running Solr and hit the button that forces a full GC. Another way is to

Re: Using Solr as a Database?

2019-06-02 Thread Erick Erickson
You must be able to rebuild your index completely when, at some point, you change your schema in incompatible ways. For that reason, either you have to play tricks with Solr (i.e. store all fields or the original document or….) or somehow have access to the original document. Furthermore,

Re: Using Solr as a Database?

2019-06-02 Thread Jörn Franke
It depends what you want to do with it. You can store all fields in Solr and filter on them. However, as soon as it comes to Acid guarantees or if you need to join the data you will be probably needing something else than Solr (or have other workarounds eg flatten the table ). Maybe you can

Re: Intermittent error 401 with JSON Facet query to retrieve count all collections

2019-06-02 Thread Colvin Cowie
Hello. I encountered this issue too and wrote this up before I found this thread, but I thought I might as well post it still, if it helps... Currently I'm trying to move our product on to Solr 8.1.1. We are currently using 6.6.6, so things have definitely moved on. We use the BasicAuthPlugin +

Using Solr as a Database?

2019-06-02 Thread Ralph Soika
Inspired by an article in the last german JavaMagazin written by Uwe Schindler I wonder if Solr can also be used as a database? In our open source project Imixs-Workflow we use Lucene since several years with great success. We have unstructured

Alternate Fields for Unified Highlighter

2019-06-02 Thread Furkan KAMACI
Hi All, I want to switch to Unified Highlighter due to performance reasons for my Solr 7.6 I was using these fields solrQuery.addHighlightField("content_*") .set("f.content_en.hl.alternateField", "content") .set("f.content_es.hl.alternateField", "content") .set("hl.useFastVectorHighlighter",

Re: Solr Heap Usage

2019-06-02 Thread John Davis
This makes sense, any ideas why lucene/solr will use 10g heap for a 20g index.My hypothesis was merging segments was trying to read it all but if that's not the case I am out of ideas. The one caveat is we are trying to add the documents quickly (~1g an hour) but if lucene does write 100m segments

Optimal number of facet threads

2019-06-02 Thread Saurabh Sharma
Hi All, I am trying to use fcs method of faceting and set the facet thread count to 3 .But when I check the thread dump around 15 threads are visible for faceting. Do setting facet thread count to 3 open 3 thread for each segment? On what basis should we decide no facet threads? Thanks

Re: Number of threads for addDocument in parallel way

2019-06-02 Thread Jörn Franke
And to send multiple documents in one add document step > Am 02.06.2019 um 08:45 schrieb calamita.agost...@libero.it.invalid: > > > > Ho all, > I've an ingestion application that reads many files and indexes records > in a SolrCloud 7.4 with SolrJ using addDocument. > The application is

Re: Number of threads for addDocument in parallel way

2019-06-02 Thread Jörn Franke
How many cpu cores does the Solr machine has? The best way would be also to shard the index across several machines. > Am 02.06.2019 um 08:45 schrieb calamita.agost...@libero.it.invalid: > > > > Ho all, > I've an ingestion application that reads many files and indexes records > in a

Number of threads for addDocument in parallel way

2019-06-02 Thread calamita . agostino
Ho all, I've an ingestion  application  that reads many files and indexes  records  in a SolrCloud  7.4 with SolrJ using addDocument. The application is  multithreading, every thread reads a file and send addDocument. I see that from 10 to 30 threads executing addDocument in parallel, number