Peter,

If you are looking for a way to constrain inserts into your dataset, Drill
isn't going to manage this for you. Using CREATE TABLE, you can create a
new directory to add data to a dataset (effectively a bulk insert).

Drill is focused on analytical workloads, and supports CREATE TABLE to
enable the creation of materialized views, or for ETLing into a more
efficient format like parquet.

If you are receiving data in a format that is processable by Drill, like
delimited text, you could run a query against the raw data to check for
your constraint and only insert data that meets the constraint. For your
example here, you could use an inner join to find all of the records that
have valid county codes, and only insert those into your table.

- Jason Altekruse

On Wed, Jan 6, 2016 at 7:31 AM, Christopher Matta <[email protected]> wrote:

> Are you asking about a simple JOIN?
>
> Below I have a simple list of country_data.csv that I’m joining to an iso
> dataset:
>
> 0: jdbc:drill:> select * from maprfs.cmatta.`iso_3166_2_countries.csv`
> limit 10;
> +---------+
> | columns |
> +---------+
> | ["Sort Order","Common Name","Formal Name","Type","Sub
> Type","Sovereignty","Capital","ISO 4217 Currency Code","ISO 4217
> Currency Name","ITU-T Telephone Code","ISO 3166-1 2 Letter Code","ISO
> 3166-1 3 Letter Code","ISO 3166-1 Number","IANA Country Code TLD\r"] |
> | ["1","Afghanistan","Islamic State of Afghanistan","Independent
> State","","","Kabul","AFN","Afghani","93","AF","AFG","4",".af\r"] |
> | ["2","Albania","Republic of Albania","Independent
> State","","","Tirana","ALL","Lek","355","AL","ALB","8",".al\r"] |
> | ["3","Algeria","People's Democratic Republic of
> Algeria","Independent
> State","","","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz\r"] |
> | ["4","Andorra","Principality of Andorra","Independent
> State","","","Andorra la
> Vella","EUR","Euro","376","AD","AND","20",".ad\r"] |
> | ["5","Angola","Republic of Angola","Independent
> State","","","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao\r"] |
> | ["6","Antigua and Barbuda","","Independent State","","","Saint
> John's","XCD","Dollar","-267","AG","ATG","28",".ag\r"] |
> | ["7","Argentina","Argentine Republic","Independent
> State","","","Buenos Aires","ARS","Peso","54","AR","ARG","32",".ar\r"]
> |
> | ["8","Armenia","Republic of Armenia","Independent
> State","","","Yerevan","AMD","Dram","374","AM","ARM","51",".am\r"] |
> | ["9","Australia","Commonwealth of Australia","Independent
> State","","","Canberra","AUD","Dollar","61","AU","AUS","36",".au\r"] |
> +---------+
> 10 rows selected (0.905 seconds)
> 0: jdbc:drill:> select * from maprfs.cmatta.`country_data.csv` limit 10;
> +--------------------------------------------------------------------+
> |                              columns                               |
> +--------------------------------------------------------------------+
> | ["1","Kabul","AFN","Afghani","93","AF","AFG","4",".af"]            |
> | ["2","Tirana","ALL","Lek","355","AL","ALB","8",".al"]              |
> | ["3","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz"]          |
> | ["4","Andorra la Vella","EUR","Euro","376","AD","AND","20",".ad"]  |
> | ["5","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao"]          |
> +--------------------------------------------------------------------+
> 5 rows selected (0.35 seconds)
> 0: jdbc:drill:> select c.`columns`[1] as capital, i.`columns`[1] as
> country from maprfs.cmatta.`country_data.csv` c JOIN
> maprfs.cmatta.`iso_3166_2_countries.csv` i ON c.`columns`[7] =
> i.`columns`[12];
> +-------------------+--------------+
> |      capital      |   country    |
> +-------------------+--------------+
> | Kabul             | Afghanistan  |
> | Tirana            | Albania      |
> | Algiers           | Algeria      |
> | Andorra la Vella  | Andorra      |
> | Luanda            | Angola       |
> +-------------------+--------------+
> 5 rows selected (0.82 seconds)
> 0: jdbc:drill:>
>
> What do you hope to accomplish with Pig? A lot of manipulation of data can
> be accomplished with Drill and SQL.
>
> On Tuesday, January 5, 2016, Peder Jakobsen | gmail <[email protected]>
> wrote:
>
> Does drill have something analogous  to  SQL  CONSTRAINT and REFERENCES for
> > doing table type  lookups?
> >
> > I have a large data that uses numerical ISO country codes; they need to
> be
> > replaced with country names that are mapped in a small CSV file
> (iso_code,
> > country_name).
> >
> > Any suggestions?
> >
> > Thanks you,
> >
> > Peder
> >
> > P.S. Is it worth learning Pig Latin or something similar for more complex
> > queries and data manipulation in Drill?
> >
> ​
>

Reply via email to