Since your data is sequential, it seems to me that the search for nodes with the same visit-id or same visitor-id is limited to only recent entries from the input log. For example, you say that the visit-id changes after 20minutes, so you only need to search back the last 20 minutes.
For extremely low memory usage, keep the visit nodes in a chain of NEXT relationships, and you can search backwards when needed. Of course that is not the best solution. I think for your case, just keep an in-memory cash of recent visit-ids. If you only need to store a 20-minute window, and only of the visit and visitor ids, that is a small memory cache compared to the total 100k page views you are loading. And the final option, use a lucene index on the visit and visitor nodes to find them when you need to. This is best if you cannot rely on the time-window in-memory cache idea. On Mon, Jul 5, 2010 at 4:57 PM, Logo Bogo <[email protected]> wrote: > Hi, > > I want to use neo4j to analyse apache logs. Each visitor is identified by a > session ID, and each visit has an ID too - a new ID is assigned after 20 > minutes between page views. The graph consists of Visit, Visitor and Page > nodes where a Visitor -> multiple Visit nodes, and a Visit -> multiple Page > nodes. The Visit -> Page relationship has a property to indicate when in the > visit the page was visited (i.e. 1 = first page visited, 2 = second page in > the visit, etc). > > How would I best go about importing data into this graph? I'd use the batch > inserter, but before I create a new Visitor or Visit I need to check whether > a node exists already with the same ID. I've read that it's better to use > the EmbeddedGraphDatabase, but I'm going to be inserting ~ 100K nodes 3 > times per day. In the past using MySQL in a similar way performance was > abysmal, so I couldn't do take this approach. > > Will I be able to just use an EmbeddedGraphDatabase, or should I have a > rethink? How is performance likely to be for these inserts? > > Thanks > Tim > > > > > _______________________________________________ > Neo4j mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

