Something like that. You might choose a smaller granularity than minute if you're really getting that many ticks per minute. But you probably want a consistent granularity to make it relatively easy to find what you are looking for. You'll probably also want the date in the key.
-- Mike Stolz Principal Engineer, GemFire Product Manager Mobile: 631-835-4771 On Tue, Feb 23, 2016 at 11:07 AM, Andrew Munn <[email protected]> wrote: > How does that work when you're appending incoming data in realtime? Say > you're getting 1,000,000 data points per day on each of 1,000 incoming > stock symbols. That is 1bln data points. Are you using keys like this > that bucket the data into one array per minute of the day > > MSFT-08:00 > MSFT-08:01 > ... > MSFT-08:59 > etc? > > each array might have several thousand elements in that case. > > Thanks > Andrew > > On Mon, 22 Feb 2016, Michael Stolz wrote: > > > You will definitely want to use arrays rather than storing each > individual data point because the overhead of each entry in Geode is nearly > 300 bytes. > > You could choose to partition by day/week/month but it shouldn't be > necessary because the default partitioning scheme should be random enough > to get reasonable distribution > > if you are using the metadata and starting timestamp of the array as the > key. > > > > > > --Mike Stolz > > Principal Engineer, GemFire Product Manager > > Mobile: 631-835-4771 > > > > On Fri, Feb 19, 2016 at 1:43 PM, Alan Kash <[email protected]> wrote: > > Hi, > > I am also building a dashboard prototype for time-series data, > > > > For time-series data, usually we target a single metric change (stock > price, temperature, pressure, etc.) for an entity, but the associated > metadata with event - > > {StockName/Place, DeviceID, ApplicationID, EventType} remains constant. > > > > For a backend like Cassandra, we denormalize everything and put > everything in a flat key-map with [Metric, Timestamp, DeviceID, Type] as > the key. This results in data > > duplication of the associated "Metadata". > > > > Do you recommend similar approach for Geode ? > > > > Alternatively, > > > > We can have an array for Metrics associated with a given Metadata key > and store it in a Map ? > > > > Key = [Metadata, Timestamp] > > > > TSMAP<Key, Array<Metric>> series = [1,2,3,4,5,6,7,8,9] > > > > We can partition this at application level by day / week / month. > > > > Is this approach better ? > > > > There is a metrics spec for TS data modeling for those who are > interested - http://metrics20.org > > > > Thanks > > > > > > > > On Fri, Feb 19, 2016 at 1:11 PM, Michael Stolz <[email protected]> > wrote: > > You will likely get best results in terms of speed of access if > you put some structure around the way you store the data in-memory. > > First off, you would probably want to parse the data into the individual > fields and create a Java object that represents that structure. > > > > Then you would probably want to bundle those Java structures into arrays > in such a way that it is easy to get to the array for a particular date and > time by the > > combination of a ticker and a date and time as the key. > > > > Those arrays of Java objects is what you would store as entries in Geode. > > I think this would give you the fastest access to the data. > > > > By the way, probably better to use an integer Julian date and a long > integer for the time rather than a Java Date. Java Dates in Geode PDX are > way bigger than you > > want when you have millions of them. > > > > Looking at the sample dataset you provided it appears there is a lot of > redundant data in there. Repeating 1926.75 for instance. > > In fact, every field but 2 are all the same. Are the repetitious fields > necessary? If they are, then you might consider using a columnar approach > instead of the > > Java structures I mentioned. Make an array for each column and compact > the repetitions with a count. It would be slower but more compact. > > The timestamps are all the same too. Strange. > > > > > > > > --Mike Stolz > > Principal Engineer, GemFire Product Manager > > Mobile: 631-835-4771 > > > > On Fri, Feb 19, 2016 at 12:15 AM, Gregory Chase <[email protected]> > wrote: > > Hi Andrew,I'll let one of the committers answer to your specific > data file question. However, you might find some inspiration in this open > source demo > > that some of the Geode team presented at OSCON earlier this year: > http://pivotal-open-source-hub.github.io/StockInference-Spark/ > > > > This was based on a pre-release version of Geode, so you'll want to sub > the M1 release in and see if any other tweaks are required at that point. > > > > I believe this video and presentation go with the Github project: > http://www.infoq.com/presentations/r-gemfire-spring-xd > > > > On Thu, Feb 18, 2016 at 8:58 PM, Andrew Munn <[email protected]> wrote: > > What would be the best way to use Geode (or GF) to store and > utilize > > financial time series data like a stream of stock trades? I have > ASCII > > files with timestamps that include microseconds: > > > > 2016-02-17 > 18:00:00.000660,1926.75,5,5,1926.75,1926.75,14644971,C,43,01, > > 2016-02-17 > 18:00:00.000660,1926.75,80,85,1926.75,1926.75,14644971,C,43,01, > > 2016-02-17 > 18:00:00.000660,1926.75,1,86,1926.75,1926.75,14644971,C,43,01, > > 2016-02-17 > 18:00:00.000660,1926.75,6,92,1926.75,1926.75,14644971,C,43,01, > > 2016-02-17 > 18:00:00.000660,1926.75,27,119,1926.75,1926.75,14644971,C,43,01, > > 2016-02-17 > 18:00:00.000660,1926.75,3,122,1926.75,1926.75,14644971,C,43,01, > > 2016-02-17 > 18:00:00.000660,1926.75,5,127,1926.75,1926.75,14644971,C,43,01, > > 2016-02-17 > 18:00:00.000660,1926.75,4,131,1926.75,1926.75,14644971,C,43,01, > > 2016-02-17 > 18:00:00.000660,1926.75,2,133,1926.75,1926.75,14644971,C,43,01, > > > > I have one file per day and each file can have over 1,000,000 > rows. My > > thought is to fault in the files and parse the ASCII as needed. I > know I > > could store the data as binary primitives in a file on disk > instead of > > ASCII for a bit more speed. > > > > I don't have a cluster of machines to create an HDFS cluster > with. My > > machine does have 128GB of RAM though. > > > > Thanks! > > > > > > > > > > -- > > Greg Chase > > Global Head, Big Data Communities > > http://www.pivotal.io/big-data > > > > Pivotal Software > > http://www.pivotal.io/ > > > > 650-215-0477 > > @GregChase > > Blog: http://geekmarketing.biz/ > > > > > > > > > > > >
