On 12/18/2013 12:49 AM, Brian T. Carcich wrote:
I'm working on an SQLite solution to get at star catalogs; they are usually
searched via Right Ascension (RA), Declination (DEC), and magnitude (mag).
  RA,DEC is a spherical coordinate system to specify a star position on-sky;
magnitude is related to star brightness.

What I have so far is here:

    https://github.com/drbitboy/Tycho2_SQLite_RTree


I started with the Tycho-2 star catalog.  It comprises 2.5 million stars in
a flat ASCII, fixed-width catalog file (actually two files but ignore that
for now), and an index file (map) of ~10k small RA-DEC regions, with an
average of ~250 stars in each region.  The regions do not overlap, and all
the stars in any one region are in contiguous lines in the catalog file.

The index file does not implement any grouping or sorting by magnitude.
  Each index region refers to

A) a contiguous region on-sky with defined by a min-max RA pair and a
min-max DEC pair.

B)  a contiguous range of the lines (stars) in the flat file that are
within that region.

So the data in the index file are a reasonable starting point for an R*Tree
in SQLite3.  I put the index file data into the virtual table using the RA
and DEC limits for each region as the two min-max pairs of columns in the
table, and the index table, referenced by the primary key of the virtual
table, contains the starting and ending+1 indices (offsets actually) of the
stars in the flat catalog file for each region.

So I use the R*Tree module to get a fast lookup into the index table,
returning index regions that overlap an input RA and DEC min-max pair, then
step through the catalog lines for each of those regions.

Here's my question:  is there any advantage to skipping the index step and
putting the star catalog data into the virtual table itself?  One advantage
is that I could include the magnitude in the rtree table.

The reason I ask is that rtree table uses min-max pairs, but each star is a
point so the min and max are equal for each star.  Would that break any
implicit R*Tree rules or negate any efficiencies?

Points are fine. So long as (max>=min) for all dimensions.
R-tree will return SQLITE_CONSTRAINT if you try to insert
a record for which this is not the case.

I guess in theory storing each individual star in the
r-tree might be more efficient. Difficult to say if it
would be significant though. The r-tree and other virtual
table code is not as optimized as the core, so even if
it seems better in theory it might not be in practice.

Adding the magnitude to the r-tree structure as an extra
dimension that is never queried sounds like it might make
the r-tree structure less efficient though. R-tree will
make some effort to place records with similar magnitudes
on the same node, which won't help the query but will
presumably cause some reduction in the localization that
does matter.

Is it working now? How many more stars do you have data for?

Dan.





_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to