Too busy these days on work. I'd like to continue this topic talk. 

I got 10M+ tradition-chinese news articles. Due to lack of time to write my own 
traditional-Chinese tokenizer, I use CJK tokenizer. However, CJK uses bigram 
and thus will create a very large index on my case. I don't know if I should 
jump directly to SolrCloud, or as Erik suggest. just create master-slave 
architecture with better tuninb of perfomance, to buid a 10M+ tradition chinese 
Solrbase?

Furthermore, if anyone has any good recommendation of traditional Chinese 
tokenizer?

----- Original Message ----- 
From: Erick Erickson 
To: solr-user 
Date: 2015-09-04, 01:47:23
Subject: Re: Re: Re: Re: Re: concept and choice: custom sharding or auto 
sharding?


Ah, that may make my suggestions unworkable re: just reindexing.

Still, how much time are we talking about here? I've very often found
that indexing performance isn't gated by the Solr processing, but by
whatever is feeding Solr. A quick test is to fire up your indexing
and see if the CPU utilization by Solr is very high. As Toke said,
though, if you're using DIH you're out of luck.

Here's an article to get you started with SolrJ:
http://lucidworks.com/blog/indexing-with-solrj/

Best,
Erick

On Thu, Sep 3, 2015 at 10:26 AM, Toke Eskildsen <t...@statsbiblioteket.dk> 
wrote:
> scott chu <scott....@udngroup.com> wrote:
>> No, both. But first I have to face the indexing performance problem.
>> Where can I see information about concurrent/parallel indexing on Solr?

>
> Depends on how you index. If you use a Java program,
> http://lucene.apache.org/solr/5_2_0/solr-solrj/index.html?org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.html
> seems to do the trick (I haven't tried that one myself).
>
> If you are sending updates using curl or similar, you just need to start more 
> processes doing that.
>
> If you are using DataImportHandler, I think you are out of luck. As far as I 
> know, it does not support multiple index threads.
>
> - Toke Eskildsen


-----
未在此訊息中找到病毒。
已透過 AVG 檢查 - www.avg.com
版本: 2015.0.6086 / 病毒庫: 4409/10571 - 發佈日期: 09/03/15




 

Reply via email to