Are you asking about a simple JOIN? Below I have a simple list of country_data.csv that I’m joining to an iso dataset:
0: jdbc:drill:> select * from maprfs.cmatta.`iso_3166_2_countries.csv` limit 10; +---------+ | columns | +---------+ | ["Sort Order","Common Name","Formal Name","Type","Sub Type","Sovereignty","Capital","ISO 4217 Currency Code","ISO 4217 Currency Name","ITU-T Telephone Code","ISO 3166-1 2 Letter Code","ISO 3166-1 3 Letter Code","ISO 3166-1 Number","IANA Country Code TLD\r"] | | ["1","Afghanistan","Islamic State of Afghanistan","Independent State","","","Kabul","AFN","Afghani","93","AF","AFG","4",".af\r"] | | ["2","Albania","Republic of Albania","Independent State","","","Tirana","ALL","Lek","355","AL","ALB","8",".al\r"] | | ["3","Algeria","People's Democratic Republic of Algeria","Independent State","","","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz\r"] | | ["4","Andorra","Principality of Andorra","Independent State","","","Andorra la Vella","EUR","Euro","376","AD","AND","20",".ad\r"] | | ["5","Angola","Republic of Angola","Independent State","","","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao\r"] | | ["6","Antigua and Barbuda","","Independent State","","","Saint John's","XCD","Dollar","-267","AG","ATG","28",".ag\r"] | | ["7","Argentina","Argentine Republic","Independent State","","","Buenos Aires","ARS","Peso","54","AR","ARG","32",".ar\r"] | | ["8","Armenia","Republic of Armenia","Independent State","","","Yerevan","AMD","Dram","374","AM","ARM","51",".am\r"] | | ["9","Australia","Commonwealth of Australia","Independent State","","","Canberra","AUD","Dollar","61","AU","AUS","36",".au\r"] | +---------+ 10 rows selected (0.905 seconds) 0: jdbc:drill:> select * from maprfs.cmatta.`country_data.csv` limit 10; +--------------------------------------------------------------------+ | columns | +--------------------------------------------------------------------+ | ["1","Kabul","AFN","Afghani","93","AF","AFG","4",".af"] | | ["2","Tirana","ALL","Lek","355","AL","ALB","8",".al"] | | ["3","Algiers","DZD","Dinar","213","DZ","DZA","12",".dz"] | | ["4","Andorra la Vella","EUR","Euro","376","AD","AND","20",".ad"] | | ["5","Luanda","AOA","Kwanza","244","AO","AGO","24",".ao"] | +--------------------------------------------------------------------+ 5 rows selected (0.35 seconds) 0: jdbc:drill:> select c.`columns`[1] as capital, i.`columns`[1] as country from maprfs.cmatta.`country_data.csv` c JOIN maprfs.cmatta.`iso_3166_2_countries.csv` i ON c.`columns`[7] = i.`columns`[12]; +-------------------+--------------+ | capital | country | +-------------------+--------------+ | Kabul | Afghanistan | | Tirana | Albania | | Algiers | Algeria | | Andorra la Vella | Andorra | | Luanda | Angola | +-------------------+--------------+ 5 rows selected (0.82 seconds) 0: jdbc:drill:> What do you hope to accomplish with Pig? A lot of manipulation of data can be accomplished with Drill and SQL. On Tuesday, January 5, 2016, Peder Jakobsen | gmail <[email protected]> wrote: Does drill have something analogous to SQL CONSTRAINT and REFERENCES for > doing table type lookups? > > I have a large data that uses numerical ISO country codes; they need to be > replaced with country names that are mapped in a small CSV file (iso_code, > country_name). > > Any suggestions? > > Thanks you, > > Peder > > P.S. Is it worth learning Pig Latin or something similar for more complex > queries and data manipulation in Drill? >
