On Tue, Jul 31, 2012 at 3:14 AM, <benjamin.bertr...@lfv.se> wrote:
> Hello,****
>
> ** **
>
> I ’ve started to write a parser to convert ASTERIX data to HDF5, but I
> have some problem to represent all the data.****
>
> ** **
>
> I use table objects. I’ve defined a class for each category record (a
> record is made of different data items).****
>
> See below as an example for category 30.
>
There might be an easier way to do this with numpy dtypes. In pseudo-code:
np.dtype([(colname, np.int16) for colname in colnames])
> ****
>
> ** 1. Some data items are optional. Is there a good way to mark a column
> as valid?
>
If you want to mark the whole column as valid, you can use a boolean
attribute on the table itself for each column. They could be named like
colname_valid.
See http://pytables.github.com/usersguide/libref.html#tables.Leaf.setAttr
and http://pytables.github.com/usersguide/libref.html#the-attributeset-class
for
more info.
> For enum, I can easily add an “uninitialized” default value.****
>
> For (U)Int, most of the time there is no good default value that could
> tell me if the data is valid or not.****
>
> I thought about using np.nan, but that’s only for float.
>
This is more for flagging individual cells as valid or not. For integers
you need to pick a values which means invalid (like -999999).
> ****
>
> Or I could add a bool variable (valid) to each column.****
>
> Is there another way?
>
Yes, attributes, as above.
****
>
> ** 2. Some data items have a variable length (in fact some fields can be
> repeated).
>
> In the class I030_050_DESC for example, if FX is set to 1, then the fields
> are repeated.****
>
> I know that fields cannot be of a variable length in a table object.****
>
> I could try to use a shape (max_length,) for those columns but the max
> length would be a bit arbitrary as there is no theoretical limit (even if
> in practice, it is often quite low).****
>
> Or should I try to represent the data using a VLArray?****
>
> I found it quite natural to represent my data as a table and I don’t
> really see how I could do the same with an array.
>
If you don't want to use a VLArray, then maxlen is probably your best
option.
If you want to do something a little more sophisticated, you could break
you data out into a main table and then a helper VLarray. Every row in the
table is matched by the same row in vlarray. Then when you want to get
your full data back out, you have to go to the table and the vlarray. This
makes things a little more annoying to work with, but it does what you want.
Hope this helps. Feel free to ask more questions!
Be Well
Anthony
> ****
>
> ** **
>
> Cheers,****
>
> ** **
>
> Benjamin****
>
> ** **
>
> ** **
>
> class I030_180_DESC(tables.IsDescription):****
>
> """Calculated Track Velocity (Polar)"""****
>
> SPEED = tables.UInt16Col(pos=0)****
>
> HEADING = tables.UInt16Col(pos=1)****
>
> ** **
>
> class I030_181_DESC(tables.IsDescription):****
>
> """Calculated Track Velocity (Cartesian)"""****
>
> X = tables.Int16Col(pos=0)****
>
> Y = tables.Int16Col(pos=1)****
>
> ** **
>
> class I030_340_DESC(tables.IsDescription):****
>
> """Last Measured Mode 3/A"""****
>
> V = tables.EnumCol(tables.Enum({****
>
> "Code validated": 0, ****
>
> "Code not validated": 1,****
>
> "uninitialized": 255****
>
> }), "uninitialized",****
>
> base="uint8",****
>
> pos=0)****
>
> G = tables.EnumCol(tables.Enum({****
>
> "Default": 0, ****
>
> "Garbled code": 1,****
>
> "uninitialized": 255****
>
> }), "uninitialized",****
>
> base="uint8",****
>
> pos=1)****
>
> L = tables.EnumCol(tables.Enum({****
>
> "MODE 3/A code as derived from the reply of the transponder,": 0,
> ****
>
> "Smoothed MODE 3/A code as provided by a local tracker": 1****
>
> "uninitialized": 255****
>
> }), "uninitialized",****
>
> base="uint8",****
>
> pos=2)****
>
> sb = tables.UInt8Col(pos=3)****
>
> mode_3_a = tables.UInt16Col(pos=4)****
>
> ** **
>
> class I030_400_DESC(tables.IsDescription):****
>
> """Callsign"""****
>
> callsign = tables.StringCol(7, pos=0)****
>
> ** **
>
> class I030_050_DESC(tables.IsDescription):****
>
> """Artas Track Number"""****
>
> AUI = tables.UInt8Col(pos=0)****
>
> unused = tables.UInt8Col(pos=1)****
>
> STN = tables.UInt16Col(pos=2)****
>
> FX = tables.EnumCol(tables.Enum({****
>
> "end of data item": 0, ****
>
> "extension into next extent": 1,****
>
> "uninitialized": 255****
>
> }), "uninitialized",****
>
> base="uint8",****
>
> pos=3)****
>
> ** **
>
> class I030Record(tables.IsDescription):****
>
> """Cat 030 record"""****
>
> ff_timestamp = tables.Time32Col()****
>
> I030_010 = I030_010_DESC()****
>
> I030_015 = I030_015_DESC()****
>
> I030_030 = I030_030_DESC()****
>
> I030_035 = I030_035_DESC()****
>
> I030_040 = I030_040_DESC()****
>
> I030_070 = I030_070_DESC()****
>
> I030_170 = I030_170_DESC()****
>
> I030_100 = I030_100_DESC()****
>
> I030_180 = I030_180_DESC()****
>
> I030_181 = I030_181_DESC()****
>
> I030_060 = I030_060_DESC()****
>
> I030_150 = I030_150_DESC()****
>
> I030_140 = I030_140_DESC()****
>
> I030_340 = I030_340_DESC()****
>
> I030_400 = I030_400_DESC()****
>
> ...****
>
> I030_210 = I030_210_DESC()****
>
> I030_120 = I030_120_DESC()****
>
> I030_050 = I030_050_DESC()****
>
> I030_270 = I030_270_DESC()****
>
> I030_370 = I030_370_DESC()****
>
> ** **
>
> ** **
>
> *Från:* Anthony Scopatz [mailto:scop...@gmail.com]
> *Skickat:* den 12 juli 2012 00:02
> *Till:* Discussion list for PyTables
> *Ämne:* Re: [Pytables-users] advice on using PyTables****
>
> ** **
>
> Hello Benjamin, ****
>
> ** **
>
> Not knowing to much about the ASTERIX format, other than what you said and
> what is in the links, I would say that this is a good fit for HDF5 and
> PyTables. PyTables will certainly help you read in the data and manipulate
> it. ****
>
> ** **
>
> However, before you abandon hachoir completely, I will say it is a lot
> easier to write hdf5 files in PyTables than to use the HDF5 C API. If
> hachoir is too slow, have you tried profiling the code to see what is
> taking up the most time? Maybe you could just rewrite these parts in C?
> Have you looked into Cythonizing it? Also, you don't seem to be using
> numpy to read in the data... (there are some tricks given ASTERIX here, but
> not insurmountable).****
>
> ** **
>
> I ask the above, just so you don't have to completely rewrite everything.
> You are correct though that pure python is probably not sufficient. Feel
> free to ask more questions here.****
>
> ** **
>
> Be Well****
>
> Anthony****
>
> ** **
>
> On Wed, Jul 11, 2012 at 6:52 AM, <benjamin.bertr...@lfv.se> wrote:****
>
> Hi,
>
> I'm working with Air Traffic Management and would like to perform checks /
> compute statistics on ASTERIX data.
> ASTERIX is an ATM Surveillance Data Binary Messaging Format (
> http://www.eurocontrol.int/asterix/public/standard_page/overview.html)
>
> The data consist of a concatenation of consecutive data blocks.
> Each data block consists of data category + length + records.
> Each record is of variable length and consists of several data items (that
> are well defined for each category).
> Some data items might be present or not depending on a field specification
> (bitfield).
>
> I started to write a parser using hachoir (
> https://bitbucket.org/haypo/hachoir/overview) a pure python library.
> But the parsing was really too slow and taking a lot of memory.
> That's not really useable.
>
> >From what I read, PyTables could really help to manipulate and analyze
> the data.
> So I've been thinking about writing a tool (probably in C) to convert my
> ASTERIX format to HDF5.
>
> Before I start, I'd like confirmation that this seems like a suitable
> application for PyTables.
> Is there another approach than writing a conversion tool to HDF5?
>
> Thanks in advance
>
> Benjamin****
>
> ** **
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users