Hi Bryce,

There is very small chance a sessionID will repeat, let's assume that in theory it won't happen, that assume, there will be always a single writer per sessionID and such single writer will always happen to be the same writer, you want to lock using as locking key the session ID for two concurrent operations that might happen at the same time:

1. Add an entry to the sessionID and once it reaches the threshold+1.
2. Balance the key, meaning, get the last 2000 items and put them at
   the next sequence and at the current key keep only the last item.

If it wasn't because of balancing the keys the lock is not needed, and you don't want to balance while writing, you want add entries ASAP and them queue a request to redistribute/balance such sessionID.

If I were to give you a Java implementation of an entry, an entry should have a joda DateTime and a String and use DateTime for the hashcode() method and both for equals and DateTime for sorting, and instead of a List, use a sorted set so that unique and order are taken care of automatically.

The Session class then will have just equals and hashcode for the sessionID and have a SortedSet<Entry>

Hope that helps,

Guido.


On 25/04/14 00:02, Bryce Verdier wrote:
Thank you Jason and Guido for the quick responses and for helping me flesh out this idea.

Jason, Thank you for pointing out how you juggle a similar task, I'll look into the Set CDRT and hope that its able to get me somewhat close to what I was hoping for. One concern about your response that worries me though is this paragraph:

I like sorted JSON, but any data format that produces identical strings
would work.  If there is a chance of duplicate submissions into Riak,
you need to ensure the data format always produces identical output to
allow Riak to recognize and eliminate duplicates.

Would you mind elaborating on what you mean by that? In theory, this shouldn't be a problem as the sessionID's are unique (read: a very low probability of having a duplicate sessionID within a somewhat large period of time). But I just want to make sure I understand what you're saying there.

Warm regards,
Bryce


On 04/24/2014 01:13 AM, Guido Medina wrote:
Hi Bryce,

If each session ID is unique, even with multiple writers is unlikely for you to be writing to the same key at the same time from two different writers, that being the case, you could store each event as a JSON object into a JSON array and when your array reaches a threshold, say for a sake of an example; 2000, you could move those entries to another key, so your JSON will look like:

session12345 {
  next : 10,
  entries : [
     ...,...,...
  ]
}

session12345-10 {
  next : 9,
  entries : [
     ...,...,...
  ]
}

...
...
...

session12345-1 {
  entries : [
     ...,...,...
  ]
}

The reason while next is decrementing is because entries are sorted descending by time (newest events 1st) and balanced keys are inserted in between, so initially you have sesssion12345 and when you get to 2001 events you have session12345 with next pointing to 1 (or 0 base) and sesssion12345-1, null if that session hasn't been balanced.

I'm using the hyphen as a convention, it can be any separator you want,

Hope that helps,

Guido.

On 24/04/14 04:29, Jason Campbell wrote:
Hi Bryce,

I have code that does something similar to this, and it works well.

In my case, the value is a JSON array, with a JSON object per event.

Siblings are easily resolved by merging the two arrays.

In Riak 2.0, sets using JSON-encoded strings would probably do this
automatically and more cleanly this manually resolving siblings.

I like sorted JSON, but any data format that produces identical strings
would work.  If there is a chance of duplicate submissions into Riak,
you need to ensure the data format always produces identical output to
allow Riak to recognize and eliminate duplicates.

The other thing I would worry about is how long-lived your sessions are
and how many events can they generate.  Riak starts having performance
issues over a few MB and you should probably consider another data
model at that point (maybe storing references instead of the data
itself).

Good luck with your project,
Jason

----- Original Message -----
From: "Bryce" <[email protected]>
To: "riak-users" <[email protected]>
Sent: Thursday, 24 April, 2014 1:22:12 PM
Subject: Riak as log aggregator

Hi All,

I'm interested in using Riak for a log aggregation project. These are
basically apache logs that I would like to correlate together based on
their session ID's. These session ID's would make sense as the key, but
its the "value" part of this that confuses me. There will be multiple
lines within these logs that have the same session ID, thus I will be
creating siblings. Now, is there a CRDT that will allow me to combine
all of these siblings into a single value or will I need to write my own solution to do so? Any and all pointers are welcomed. Also, if Riak is a
bad fit for this, please let me know.

Warm regard,
Bryce


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to