Massimo,

could you please look into the Lucene Document instance that you add all the 
fields to?

If it also contains this ultralarge ArrayList with all the Fields ?

And which version of lucene did you use for your standalone testing?

Cheers

Michael

Am 27.06.2011 um 23:04 schrieb Massimo Lusetti:

> On Sat, Jun 25, 2011 at 2:59 AM, Michael Hunger
> <michael.hun...@neotechnology.com> wrote:
> 
>> Massimo,
>> 
>> when profiling this it quickly becomes apparent that the issue is within the 
>> lucene document.
>> (org.apache.lucene.document.Document)
>> 
>> it holds an arraylist of all its fields which amount to all the memory.
>> 
>> It also contains several methods that walk over that list (filtering it) and 
>> or returning copies of that.
>> 
>> Another issue that came up, the addtion takes longer and longer (because of 
>> Lucene doing a quick-sort on the fields at each flush()).
>> 
>> So my suggestion would be to shard the indexing over several arguments and 
>> hide that behind a domain level API, each document should have around 50k 
>> entries to allow lucene to handle it gracefully. After you introduced this 
>> API you should perhaps consider replacing this large index with a more 
>> appropriate key-value store (like redis, jdbm, custom-impl - depending on 
>> your real use-case which you haven't revealed :) ).
>> 
>> Cheers
> 
> My use case is this one: I got a big series of log row which I have to
> read and understand but I need to be sure to parse the log row one and
> only one time, so i calculate an SHA1 hash of the log row and put it
> in the index, if there's already that hash in the index I skip the log
> row cause it means it has already been processsed.
> 
> I've made a test with jdbm and is by far a lot worst then plain
> Lucene. BTW If i do the same test with plain Lucene implementation is
> works flawlessly without any pain, so I guess something going wired in
> the way Lucene is been used by neo4j, but I'll try to follow your
> suggestion.
> BTW I've also teste MongoDB which is slower but seems more stable...
> but my test isn't finished yet...
> 
> Cheers
> -- 
> Massimo
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to