Re: NRT vs TLOG bulk indexing performances

2019-10-30 Thread Dominique Bejean
, > > > > So, I understand that while non leader TLOG is copying the index from > > leader, the leader stop indexing. > > One shot large heavy bulk indexing should be very much more impacted than > > continus ligth indexing. > > > > Regards. > > > >

Re: NRT vs TLOG bulk indexing performances

2019-10-26 Thread Erick Erickson
"I understand that while non leader TLOG is copying the index from leader, the leader stop indexing” This _better_ not be happening. If you can demonstrate this let’s open a JIRA. > On Oct 25, 2019, at 8:28 AM, Dominique Bejean > wrote: > > I understand that while non leader TLOG is copying

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Erick Erickson
indexing. > One shot large heavy bulk indexing should be very much more impacted than > continus ligth indexing. > > Regards. > > Dominique > > > Le ven. 25 oct. 2019 à 13:54, Shawn Heisey a écrit : > >> On 10/25/2019 1:16 AM, Dominique Bejean w

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Ere Maijala
Shawn Heisey kirjoitti 25.10.2019 klo 14.54: > With newer Solr versions, you can ask SolrCloud to prefer PULL replicas > for querying, so queries will be targeted to those replicas, unless they > all go down, in which case it will go to non-preferred replica types.  I > do not know how to do this,

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
Shawn, So, I understand that while non leader TLOG is copying the index from leader, the leader stop indexing. One shot large heavy bulk indexing should be very much more impacted than continus ligth indexing. Regards. Dominique Le ven. 25 oct. 2019 à 13:54, Shawn Heisey a écrit : > On

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Shawn Heisey
On 10/25/2019 1:16 AM, Dominique Bejean wrote: For collection created with all replicas as NRT * Indexing time : 22 minutes For collection created with all replicas as TLOG * Indexing time : 34 minutes NRT indexes simultaneously on all replicas. So when indexing is done on one, it is

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
est? > > > Am 25.10.2019 um 09:16 schrieb Dominique Bejean < > dominique.bej...@eolya.fr>: > > > > Hi, > > > > I made some benchmarks for bulk indexing in order to compare performances > > and ressources usage for NRT versus TLOG replica. > > >

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Jörn Franke
Which Solr version are you using and how often you repeated the test? > Am 25.10.2019 um 09:16 schrieb Dominique Bejean : > > Hi, > > I made some benchmarks for bulk indexing in order to compare performances > and ressources usage for NRT versus TLOG replica. > > Env

NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
Hi, I made some benchmarks for bulk indexing in order to compare performances and ressources usage for NRT versus TLOG replica. Environnent : * Solrcloud with 4 Solr nodes (8 Gb RAM, 4 Gb Heap) * 1 collection with 2 shards x 2 replicas (all NRT or all TLOG) * 1 core per Solr Server Indexing

Re: Solr CPU spiking up on bulk indexing

2019-06-18 Thread Erick Erickson
Dynamic fields don’t make any difference, they’re just like fixed fields as far as merging is concerned. So this is almost certainly merging being kicked off by your commits. The number of documents and the more terms, the more work Lucene has to do, so I suspect this is just how things work.

Re: Solr CPU spiking up on bulk indexing

2019-06-18 Thread Venu
Thanks Erick. I see the above pattern only at the time of commit. I have many fields (like around 250 fields out of which around 100 fields are dynamic fields and around 3 n-gram fields and text fields, while many of them are stored fields along with indexed fields), will a merge take a lot of

Re: Solr CPU spiking up on bulk indexing

2019-06-16 Thread Erick Erickson
When indexing, segments periodically merged by background threads, which can be quite CPU intensive. See: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Segment merges can be fairly long running, so even after indexing stops it can take some time for the CPU to

Solr CPU spiking up on bulk indexing

2019-06-15 Thread Venu
Hi While doing a batch indexing, Solr CPU is spiking regularly. I am doing the auto-commit for every 5 minutes. Please find the image below On stopping the indexing, the CPU is coming to the normal state (around 20%). In the image

Solr CPU spiking up on bulk indexing

2019-06-15 Thread Venu
Hi While doing a batch indexing, Solr CPU is spiking regularly. I am doing the auto-commit for every 5 minutes. Please find the image below On stopping the indexing, the CPU is coming to the normal state (around 20%). In the image

Re: SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

2016-11-18 Thread Erick Erickson
docs, int commitWithinMs) instead, which I >> expect would already improve performance. >> Does it matter which method I use? Beside the method taking a >> Collection there is also one that takes an >> Iterator ... and what about ConcurrentUpdateSolrClient? >> Shou

Re: SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

2016-11-18 Thread Shawn Heisey
would already improve performance. > Does it matter which method I use? Beside the method taking a > Collection there is also one that takes an > Iterator ... and what about ConcurrentUpdateSolrClient? > Should I use it for bulk indexing instead of HttpSolrClient? > > Currently

SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

2016-11-18 Thread Sebastian Riemer
that to use add(Collection docs, int commitWithinMs) instead, which I expect would already improve performance. Does it matter which method I use? Beside the method taking a Collection there is also one that takes an Iterator ... and what about ConcurrentUpdateSolrClient? Should I use it for bulk

Re: problems with bulk indexing with concurrent DIH

2016-08-08 Thread Shawn Heisey
On 8/2/2016 7:50 AM, Bernd Fehling wrote: > Only assumption so far, DIH is sending the records as "update" (and > not pure "add") to the indexer which will generate delete files during > merge. If the number of segments is high it will take quite long to > merge and check all records of all

Re: problems with bulk indexing with concurrent DIH

2016-08-04 Thread Bernd Fehling
th SolrJ with additional >> performance >>>>>> boost. >>>>>> >>>>>> Bernd >>>>>> >>>>>> >>>>>> On 27.07.2016 at 16:03, Erick Erickson: >>>>>>> I'd actually recommend

Re: problems with bulk indexing with concurrent DIH

2016-08-03 Thread Bernd Fehling
nd my test continued with 8 >> concurrent DIHs. >> Then i was trying different and settings but >> now I'm stuck. >> I can't figure out what is the best setting for bulk indexing. >> What I see is that the indexing is "falling asleep" after some

Re: problems with bulk indexing with concurrent DIH

2016-08-02 Thread Shalin Shekhar Mangar
much because for 16 CPUs and my test continued with 8 > concurrent DIHs. > Then i was trying different and settings but > now I'm stuck. > I can't figure out what is the best setting for bulk indexing. > What I see is that the indexing is "falling asleep" after some time of >

Re: problems with bulk indexing with concurrent DIH

2016-08-02 Thread Mikhail Khludnev
tting a load on the Solr > >>>>> servers (especially if you're also using Tika) in addition > >>>>> to all indexing etc. > >>>>> > >>>>> Here's a sample: > >>>>> https://lucidworks.com/blog/2012/02/14/indexing-with-

Re: problems with bulk indexing with concurrent DIH

2016-08-02 Thread Bernd Fehling
if you're also using Tika) in addition >>>>>> to all indexing etc. >>>>>> >>>>>> Here's a sample: >>>>>> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ >>>>>> >>>>>> Dodging the quest

Re: problems with bulk indexing with concurrent DIH

2016-08-02 Thread Bernd Fehling
all indexing etc. >>>>> >>>>> Here's a sample: >>>>> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ >>>>> >>>>> Dodging the question I know, but DIH sometimes isn't >>>>> the best solution. >>

Re: problems with bulk indexing with concurrent DIH

2016-08-02 Thread Susheel Kumar
2/02/14/indexing-with-solrj/ > > >>> > > >>> Dodging the question I know, but DIH sometimes isn't > > >>> the best solution. > > >>> > > >>> Best, > > >>> Erick > > >>> > > >>> On Wed,

Re: problems with bulk indexing with concurrent DIH

2016-08-02 Thread Mikhail Khludnev
es isn't > >>> the best solution. > >>> > >>> Best, > >>> Erick > >>> > >>> On Wed, Jul 27, 2016 at 6:59 AM, Bernd Fehling > >>> <bernd.fehl...@uni-bielefeld.de> wrote: > >>>> After enhancing the se

Re: problems with bulk indexing with concurrent DIH

2016-07-27 Thread Bernd Fehling
g the server with SSDs I'm trying to speed up indexing. >>>> >>>> The server has 16 CPUs and more than 100G RAM. >>>> JAVA (1.8.0_92) has 24G. >>>> SOLR is 4.10.4. >>>> Plain XML data to load is 218G with about 96M records. >>>>

Re: problems with bulk indexing with concurrent DIH

2016-07-27 Thread Erick Erickson
is 4.10.4. > >> Plain XML data to load is 218G with about 96M records. > >> This will result in a single index of 299G. > >> > >> I tried with 4, 8, 12 and 16 concurrent DIHs. > >> 16 and 12 was to much because for 16 CPUs and my test continued with 8 >

Re: problems with bulk indexing with concurrent DIH

2016-07-27 Thread Bernd Fehling
.10.4. >> Plain XML data to load is 218G with about 96M records. >> This will result in a single index of 299G. >> >> I tried with 4, 8, 12 and 16 concurrent DIHs. >> 16 and 12 was to much because for 16 CPUs and my test continued with 8 >> concurrent DIHs. >> Then

Re: problems with bulk indexing with concurrent DIH

2016-07-27 Thread Erick Erickson
ent DIHs. > Then i was trying different and settings but > now I'm stuck. > I can't figure out what is the best setting for bulk indexing. > What I see is that the indexing is "falling asleep" after some time of > indexing. > It is only producing del-files, like _11_1.del, _w_2.

problems with bulk indexing with concurrent DIH

2016-07-27 Thread Bernd Fehling
DIHs. 16 and 12 was to much because for 16 CPUs and my test continued with 8 concurrent DIHs. Then i was trying different and settings but now I'm stuck. I can't figure out what is the best setting for bulk indexing. What I see is that the indexing is "falling asleep" after

Re: bulk indexing with optimistick lock

2015-02-13 Thread Scott Stults
This isn't a Solr-specific answer, but the easiest approach might be to just collect the document IDs you're about to add, query for them, and then filter out the ones Solr already has (this'll give you a nice list for later reporting). You'll need to keep your batch sizes below maxBooleanClauses

bulk indexing with optimistick lock

2015-02-11 Thread Sankalp Gupta
Hi All, My server side we are trying to add multiple documents in a list and then ask solr to add them in solr (using solrj client) and then after its finished calling the commit. Now we also want to control concurrency and for that we wanted to use solr's optimistic lock/versioning feature. That

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-18 Thread adfel70
what that is. I don't recall seeing a setting like that for Solr itself. It sounds like a setting for some other piece of software, perhaps a client, load balancer, or servlet container. Thanks, Shawn -- View this message in context: http://lucene.472066.n3.nabble.com/bulk-indexing

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-17 Thread adfel70
total of 64gb memory. My current collection (7 shards, 3 replicas) has around 500 million docs. I'm performing bulk indexing into the collection. I set softCommit to 10 minutes and hardCommit openSearcher=false to 15 minutes. How much index data does each server have on it? This would

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-17 Thread Shawn Heisey
On 3/17/2014 7:07 AM, adfel70 wrote: we currently have arround 200gb in a server. I'm aware of the RAM issue, but it somehow doesnt seems related. I would expect search latency problems. not strange eofexceptions. regarding the http.timeout - I didn't change anything concerning this. Do I

bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-16 Thread adfel70
Hi I have a 12-node solr 4.6.1 cluster. each node has 2 solr procceses, running on 8gb heap jvms. each node has total of 64gb memory. My current collection (7 shards, 3 replicas) has around 500 million docs. I'm performing bulk indexing into the collection. I set softCommit to 10 minutes

Re: bulk indexing - EofExceptions and big latencies after soft-commit

2014-03-16 Thread Shawn Heisey
On 3/16/2014 10:34 AM, adfel70 wrote: I have a 12-node solr 4.6.1 cluster. each node has 2 solr procceses, running on 8gb heap jvms. each node has total of 64gb memory. My current collection (7 shards, 3 replicas) has around 500 million docs. I'm performing bulk indexing into the collection

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Software Dev
flow of updates coming in and we would like to see them in ASAP. However we occasionally need to do some bulk indexing (once a week or less) and the need to see those updates right away isn't as critical. I would say 95% of the time we are in Index-Light Query-Light/Heavy mode and the other 5

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Software Dev
be less frequent that hard commits. Is there any way to configure autoCommit, softCommit values on a per request basis? The majority of the time we have small flow of updates coming in and we would like to see them in ASAP. However we occasionally need to do some bulk indexing (once a week or less

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Shawn Heisey
On 1/23/2014 11:01 AM, Software Dev wrote: Is there any way to configure autoCommit, softCommit values on a per request basis? The majority of the time we have small flow of updates coming in and we would like to see them in ASAP. However we occasionally need to do some bulk indexing (once

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Otis Gospodnetic
: We are testing our shiny new Solr Cloud architecture but we are experiencing some issues when doing bulk indexing. We have 5 solr cloud machines running and 3 indexing machines (separate from the cloud servers). The indexing machines pull off ids from a queue then they index and ship over

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Software Dev
are experiencing some issues when doing bulk indexing. We have 5 solr cloud machines running and 3 indexing machines (separate from the cloud servers). The indexing machines pull off ids from a queue then they index and ship over a document via a CloudSolrServer. It appears that the indexers are too

Re: Solr Cloud Bulk Indexing Questions

2014-01-22 Thread Andre Bois-Crettez
are experiencing some issues when doing bulk indexing. We have 5 solr cloud machines running and 3 indexing machines (separate from the cloud servers). The indexing machines pull off ids from a queue then they index and ship over a document via a CloudSolrServer. It appears that the indexers are too

Re: Solr Cloud Bulk Indexing Questions

2014-01-22 Thread Software Dev
shiny new Solr Cloud architecture but we are experiencing some issues when doing bulk indexing. We have 5 solr cloud machines running and 3 indexing machines (separate from the cloud servers). The indexing machines pull off ids from a queue then they index and ship over a document via

Re: Solr Cloud Bulk Indexing Questions

2014-01-22 Thread Erick Erickson
something not optimal about your setup that's the culprit. Best, Erick On Mon, Jan 20, 2014 at 4:00 PM, Software Dev static.void@gmail.com wrote: We are testing our shiny new Solr Cloud architecture but we are experiencing some issues when doing bulk indexing. We have 5 solr

Re: Solr Cloud Bulk Indexing Questions

2014-01-21 Thread Software Dev
@gmail.com wrote: We are testing our shiny new Solr Cloud architecture but we are experiencing some issues when doing bulk indexing. We have 5 solr cloud machines running and 3 indexing machines (separate from the cloud servers). The indexing machines pull off ids from a queue

Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Software Dev
We are testing our shiny new Solr Cloud architecture but we are experiencing some issues when doing bulk indexing. We have 5 solr cloud machines running and 3 indexing machines (separate from the cloud servers). The indexing machines pull off ids from a queue then they index and ship over

Re: Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Erick Erickson
about your setup that's the culprit. Best, Erick On Mon, Jan 20, 2014 at 4:00 PM, Software Dev static.void@gmail.com wrote: We are testing our shiny new Solr Cloud architecture but we are experiencing some issues when doing bulk indexing. We have 5 solr cloud machines running and 3 indexing

Re: Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Software Dev
, Jan 20, 2014 at 4:00 PM, Software Dev static.void@gmail.com wrote: We are testing our shiny new Solr Cloud architecture but we are experiencing some issues when doing bulk indexing. We have 5 solr cloud machines running and 3 indexing machines (separate from the cloud servers

Re: Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Software Dev
shiny new Solr Cloud architecture but we are experiencing some issues when doing bulk indexing. We have 5 solr cloud machines running and 3 indexing machines (separate from the cloud servers). The indexing machines pull off ids from a queue then they index and ship over a document via

Re: Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Mark Miller
the culprit. Best, Erick On Mon, Jan 20, 2014 at 4:00 PM, Software Dev static.void@gmail.com wrote: We are testing our shiny new Solr Cloud architecture but we are experiencing some issues when doing bulk indexing. We have 5 solr cloud machines running and 3 indexing machines

Re: Solr Cloud Bulk Indexing Questions

2014-01-20 Thread Software Dev
some issues when doing bulk indexing. We have 5 solr cloud machines running and 3 indexing machines (separate from the cloud servers). The indexing machines pull off ids from a queue then they index and ship over a document via a CloudSolrServer. It appears that the indexers are too

Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread steven crichton
.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread Erick Erickson
(hard and soft), your solrconfig settings, particularly around autowarming, how you're bulk indexing, SolrJ? DIH? a huge CSV file? Best, Erick On Wed, Dec 4, 2013 at 2:30 PM, steven crichton stevencrich...@mac.comwrote: I am finding with a bulk index using SOLR 4.3 on Tomcat, that when I reach

Re: Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread steven crichton
. But you need to tell us more about our setup. In particular hour commit settings (hard and soft), your solrconfig settings, particularly around autowarming, how you're bulk indexing, SolrJ? DIH? a huge CSV file? Best, Erick On Wed, Dec 4, 2013 at 2:30 PM, steven crichton [hidden email

Re: Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread Erick Erickson
. In particular hour commit settings (hard and soft), your solrconfig settings, particularly around autowarming, how you're bulk indexing, SolrJ? DIH? a huge CSV file? Best, Erick On Wed, Dec 4, 2013 at 2:30 PM, steven crichton [hidden email]wrote: I am finding with a bulk index

Bulk Indexing Question

2012-11-27 Thread Joseph C. Trubisz
Greetings… I’m new to Solr, so this might be a real amateur question. When I curl a file to be indexed (in this case, as CSV), how do I know which index it’s going to, if I have multiple indexes currently being managed by Solr? For example, I have indexes for drug, company, author, abstract

Re: Bulk Indexing Question

2012-11-27 Thread Shawn Heisey
On 11/27/2012 1:07 PM, Joseph C. Trubisz wrote: When I curl a file to be indexed (in this case, as CSV), how do I know which index it’s going to, if I have multiple indexes currently being managed by Solr? For example, I have indexes for drug, company, author, abstract and I want to CSV load

Re: Bulk Indexing

2012-07-31 Thread Mikhail Khludnev
Usually collecting whole array hurts client's jvm JVM, sending doc-by-doc bloats sever by huge number of small requests. You need just rewrite your code from the eager loop to pulling iterator to be able to submit all docs via single http request

Re: Bulk Indexing

2012-07-28 Thread Mikhail Khludnev
Lan, I assume that some particular server can freeze on such bulk. But overall message seems not absolutely correct to me. Solr has a lot of mechanisms to survive in such cases. Bulk indexing is absolutely right (if you submit single request with long iterator of SolrInputDocs). This indexing

Re: Bulk Indexing

2012-07-28 Thread Sohail Aboobaker
We have auto commit on and will basically send it in a loop after validating each record, we send it to search service. And keep doing it in a loop. Mikhail / Lan, are you suggesting that instead of sending it in a loop, we should collect them in an array and do a commit at the end? Is this better

RE: Bulk Indexing

2012-07-27 Thread Zhang, Lisheng
is not large, per folder). This is just my plan (not fully implemented yet). Best regards, Lisheng -Original Message- From: Sohail Aboobaker [mailto:sabooba...@gmail.com] Sent: Friday, July 27, 2012 6:56 AM To: solr-user@lucene.apache.org Subject: Bulk Indexing Hi, We have created a search

Re: Bulk Indexing

2012-07-27 Thread Alexandre Rafalovitch
Haven't tried this but: 1) I think SOLR 4 supports on-the-fly core attach/detach/select. Can somebody confirm this? 2) If 1) is true, run everything as two cores. 3) One core is live in production 4) Second core is detached from SOLR and attached to something like SolrJ, which I believe can index

Re: Bulk Indexing

2012-07-27 Thread Sohail Aboobaker
We will be using Solr 3.x version. I was wondering if we do need to worry about this as we have only 10k index entries at a time. It sounds like a very low number and we have only document type at this point. Should we worry about directly using SolrJ for indexing and searching for this low

RE: Bulk Indexing

2012-07-27 Thread Lan
' events during indexing. - Update in batches with a commit at the end of the batch. -- View this message in context: http://lucene.472066.n3.nabble.com/Bulk-Indexing-tp3997745p3997815.html Sent from the Solr - User mailing list archive at Nabble.com.

Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
Hi, I am starting to use solr, now I need to index a rather large amount of data, it seems that calling solr to pass data through HTTP is rather inefficient, I am think still call lucene API directly for bulk index but to use solr for search, is this design OK? Thanks very much for helps,

Re: Bulk indexing data into solr

2012-07-26 Thread Rafał Kuć
Hello! If you use Java (and I think you do, because you mention Lucene) you should take a look at StreamingUpdateSolrServer. It not only allows you to send data in batches, but also index using multiple threads. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -

Re: Bulk indexing data into solr

2012-07-26 Thread Shawn Heisey
On 7/26/2012 7:34 AM, Rafał Kuć wrote: If you use Java (and I think you do, because you mention Lucene) you should take a look at StreamingUpdateSolrServer. It not only allows you to send data in batches, but also index using multiple threads. A caveat to what Rafał said: The streaming object

RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
Thanks very much, both your and Rafal's advice are very helpful! -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Thursday, July 26, 2012 8:47 AM To: solr-user@lucene.apache.org Subject: Re: Bulk indexing data into solr On 7/26/2012 7:34 AM, Rafał Kuć wrote

Re: Bulk indexing data into solr

2012-07-26 Thread Mikhail Khludnev
Right in time, guys. https://issues.apache.org/jira/browse/SOLR-3585 Here is server side update processing fork. It does the best for halting processing on exception occurs. Plug this UpdateProcessor, specify number of threads. Then submit lazy iterator into StreamingUpdateServer at client side.

Re: Bulk indexing data into solr

2012-07-26 Thread Mikhail Khludnev
Coming back to your original question. I'm puzzled a little. It's not clear where you wanna call Lucene API directly from. if you mean that you has standalone indexer, which write index files. Then it stops and these files become available for Solr Process it will work. Sharing index between

RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
@lucene.apache.org Subject: Re: Bulk indexing data into solr Coming back to your original question. I'm puzzled a little. It's not clear where you wanna call Lucene API directly from. if you mean that you has standalone indexer, which write index files. Then it stops and these files become available

Re: Bulk indexing data into solr

2012-07-26 Thread Mikhail Khludnev
AM To: solr-user@lucene.apache.org Subject: Re: Bulk indexing data into solr Coming back to your original question. I'm puzzled a little. It's not clear where you wanna call Lucene API directly from. if you mean that you has standalone indexer, which write index files. Then it stops

RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
-Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Thursday, July 26, 2012 12:46 PM To: solr-user@lucene.apache.org Subject: Re: Bulk indexing data into solr IIRC about a two month ago problem with such scheme discussed here, but I can remember exact

Re: Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-06-01 Thread Tanguy Moal
Lee, Thank you very much for your answer. Using the signature field as the uniqueKey is effectively what I was doing, so the overwriteDupes=true parameter in my solrconfig was somehow redundant, although I wasn't aware of it! =D In practice it works perfectly and that's the nice part. By

Re: Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-05-31 Thread lee carroll
Tanguy You might have tried this already but can you set overwritedupes to false and set the signiture key to be the id. That way solr will manage updates? from the wiki http://wiki.apache.org/solr/Deduplication !-- An example dedup update processor that creates the id field on the fly

Re: Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-05-30 Thread Tanguy Moal
Hello, Sorry for re-posting this but it seems my message got lost in the mailing list's messages stream without hitting anyone's attention... =D Shortly, has anyone already experienced dramatic indexing slowdowns during large bulk imports with overwriteDupes turned on and a fairly high

Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-05-25 Thread Tanguy Moal
Dear list, I'm posting here after some unsuccessful investigations. In my setup I push documents to Solr using the StreamingUpdateSolrServer. I'm sending a comfortable initial amount of documents (~250M) and wished to perform overwriting of duplicated documents at index time, during the