Since your data is sequential, it seems to me that the search for nodes with
the same visit-id or same visitor-id is limited to only recent entries from
the input log. For example, you say that the visit-id changes after
20minutes, so you only need to search back the last 20 minutes.

For extremely low memory usage, keep the visit nodes in a chain of NEXT
relationships, and you can search backwards when needed. Of course that is
not the best solution. I think for your case, just keep an in-memory cash of
recent visit-ids. If you only need to store a 20-minute window, and only of
the visit and visitor ids, that is a small memory cache compared to the
total 100k page views you are loading.

And the final option, use a lucene index on the visit and visitor nodes to
find them when you need to. This is best if you cannot rely on the
time-window in-memory cache idea.

On Mon, Jul 5, 2010 at 4:57 PM, Logo Bogo <[email protected]> wrote:

> Hi,
>
> I want to use neo4j to analyse apache logs. Each visitor is identified by a
> session ID, and each visit has an ID too - a new ID is assigned after 20
> minutes between page views. The graph consists of Visit, Visitor and Page
> nodes where a Visitor -> multiple Visit nodes, and a Visit -> multiple Page
> nodes. The Visit -> Page relationship has a property to indicate when in the
> visit the page was visited (i.e. 1 = first page visited, 2 = second page in
> the visit, etc).
>
> How would I best go about importing data into this graph? I'd use the batch
> inserter, but before I create a new Visitor or Visit I need to check whether
> a node exists already with the same ID. I've read that it's better to use
> the EmbeddedGraphDatabase, but I'm going to be inserting ~ 100K nodes 3
> times per day. In the past using MySQL in a similar way performance was
> abysmal, so I couldn't do take this approach.
>
> Will I be able to just use an EmbeddedGraphDatabase, or should I have a
> rethink? How is performance likely to be for these inserts?
>
> Thanks
> Tim
>
>
>
>
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to