Peter, If you are looking for a way to constrain inserts into your dataset, Drill isn't going to manage this for you. Using CREATE TABLE, you can create a new directory to add data to a dataset (effectively a bulk insert).
Drill is focused on analytical workloads, and supports CREATE TABLE to enable the creation of materialized views, or for ETLing into a more efficient format like parquet. If you are receiving data in a format that is processable by Drill, like delimited text, you could run a query against the raw data to check for your constraint and only insert data that meets the constraint. For your example here, you could use an inner join to find all of the records that have valid county codes, and only insert those into your table. - Jason Altekruse On Wed, Jan 6, 2016 at 7:31 AM, Christopher Matta <[email protected]> wrote: > Are you asking about a simple JOIN? > > Below I have a simple list of country_data.csv that I’m joining to an iso > dataset: > > 0: jdbc:drill:> select * from maprfs.cmatta.`iso_3166_2_countries.csv` > limit 10; > +---------+ > | columns | > +---------+ > | ["Sort Order","Common Name","Formal Name","Type","Sub > Type","Sovereignty","Capital","ISO 4217 Currency Code","ISO 4217 > Currency Name","ITU-T Telephone Code","ISO 3166-1 2 Letter Code","ISO > 3166-1 3 Letter Code","ISO 3166-1 Number","IANA Country Code TLD\r"] | > | ["1","Afghanistan","Islamic State of Afghanistan","Independent > State","","","Kabul","AFN","Afghani","93","AF","AFG","4",".af\r"] | > | ["2","Albania","Republic of Albania","Independent > State","","","Tirana","ALL","Lek","355","AL","ALB","8",".al\r"] | > | ["3","Algeria","People's Democratic Republic of > Algeria","Independent > State","","","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz\r"] | > | ["4","Andorra","Principality of Andorra","Independent > State","","","Andorra la > Vella","EUR","Euro","376","AD","AND","20",".ad\r"] | > | ["5","Angola","Republic of Angola","Independent > State","","","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao\r"] | > | ["6","Antigua and Barbuda","","Independent State","","","Saint > John's","XCD","Dollar","-267","AG","ATG","28",".ag\r"] | > | ["7","Argentina","Argentine Republic","Independent > State","","","Buenos Aires","ARS","Peso","54","AR","ARG","32",".ar\r"] > | > | ["8","Armenia","Republic of Armenia","Independent > State","","","Yerevan","AMD","Dram","374","AM","ARM","51",".am\r"] | > | ["9","Australia","Commonwealth of Australia","Independent > State","","","Canberra","AUD","Dollar","61","AU","AUS","36",".au\r"] | > +---------+ > 10 rows selected (0.905 seconds) > 0: jdbc:drill:> select * from maprfs.cmatta.`country_data.csv` limit 10; > +--------------------------------------------------------------------+ > | columns | > +--------------------------------------------------------------------+ > | ["1","Kabul","AFN","Afghani","93","AF","AFG","4",".af"] | > | ["2","Tirana","ALL","Lek","355","AL","ALB","8",".al"] | > | ["3","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz"] | > | ["4","Andorra la Vella","EUR","Euro","376","AD","AND","20",".ad"] | > | ["5","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao"] | > +--------------------------------------------------------------------+ > 5 rows selected (0.35 seconds) > 0: jdbc:drill:> select c.`columns`[1] as capital, i.`columns`[1] as > country from maprfs.cmatta.`country_data.csv` c JOIN > maprfs.cmatta.`iso_3166_2_countries.csv` i ON c.`columns`[7] = > i.`columns`[12]; > +-------------------+--------------+ > | capital | country | > +-------------------+--------------+ > | Kabul | Afghanistan | > | Tirana | Albania | > | Algiers | Algeria | > | Andorra la Vella | Andorra | > | Luanda | Angola | > +-------------------+--------------+ > 5 rows selected (0.82 seconds) > 0: jdbc:drill:> > > What do you hope to accomplish with Pig? A lot of manipulation of data can > be accomplished with Drill and SQL. > > On Tuesday, January 5, 2016, Peder Jakobsen | gmail <[email protected]> > wrote: > > Does drill have something analogous to SQL CONSTRAINT and REFERENCES for > > doing table type lookups? > > > > I have a large data that uses numerical ISO country codes; they need to > be > > replaced with country names that are mapped in a small CSV file > (iso_code, > > country_name). > > > > Any suggestions? > > > > Thanks you, > > > > Peder > > > > P.S. Is it worth learning Pig Latin or something similar for more complex > > queries and data manipulation in Drill? > > > >
