Just try it!

Maybe you're lucky, and it works with 80M docs. If each document takes 100 k, then it only needs 8 GB memory for indexing.

However, I doubt it. I've not been too deeply into the UpdateHandler yet, but I think it first needs to parse the complete XML file before it starts to index.

But that worst thing that can happen is an OOM exception. And when you need to split the xml files, then you can split into smaller chunks as well.

Just a note: In Solr, you're always updating, even in the first indexation. There's no difference between updates and inserts.

Greetings,
Michael

Am 24.05.2012 12:37, schrieb Bruno Mannina:
In fact it's not for an update but only for the first indexation.

I mean, I will receive the full database with around 80M docs in some
XML files (one per country in the world).
 From these 80M docs I will generate right XML format for each doc. (I
don't need all fields from the source)

And as actually for my test (12 000 docs), I generate one file per doc,
there is no problem.
But with 80M docs I can't generate one file per doc.

It's for this reason I asked the max number of <doc> in a file <add>.

For the first time, if a country file fails, no problem, I will check it
and re-generate it.

Is it bad to create a file with 5M <doc> ?


Le 24/05/2012 11:46, Michael Kuhlmann a écrit :
There is no hard limit for the maximum nunmber of documents per update.

It's only memory dependent. The smaller each document, and the more
memory Solr can acquire, the more documents can you send in one update.

However, I wouldn't pish it too jard anyway. If you can send, say, 100
documents per update, the you won't gain much if you send 200
documents instead, or even 1000. The number of requests don't count
that much.

And, if the update fails for some reason, then the whole request will
be ignored. If you had sent 1000 documents in an update, and one of
them had a field missing, for example, then it's hard to find out
which one.

Greetings,
Michael

Am 24.05.2012 10:58, schrieb Bruno Mannina:
I can't find my answer concerning the max number of <doc></doc> ?

Can someone can tell me if there is no limit?

Le 24/05/2012 09:55, Bruno Mannina a écrit :
Sorry I just found : http://wiki.apache.org/solr/UpdateXmlMessages

I will take also a look to find the max number of <doc></doc>.

Le 24/05/2012 09:51, Paul Libbrecht a écrit :
Bruno,
see the solrconfig.xml, you have all sorts of tweaks for this kind of
things.

paul


Le 24 mai 2012 à 09:49, Bruno Mannina a écrit :

Hi All,

Just a little question concerning the max number of

<add>
<doc></doc>
</add>

that I can write in the xml source file before indexing? only one,
10, 100, 1000, unlimited...?

I must indexed 80M docs so I can't create one xml file by doc.

thanks,
Bruno















Reply via email to