Hi Fawze, A hive partition can only have one unique location. So partition (year=2018/month=01/day=01) can't point to both /tmp/account=aaaa/year=2018/month=01/day=01 and /tmp/account=bbbb/year=2018/month=01/day=01 together.
For your ploblem, you need to reorganize the directory hierarchy to match the partition definition: move all files in /tmp/account=*/year=2018/month=01/day=01 into /somewhere/year=2018/month=01/day=01. As you mentioned you have millions of accounts, you may need to do this in parallel via a Map-only MapReduce job or a Spark job. For example, to write a MapReduce job for this: (1) Create a text file with all these directory names. (2) Using NLineInputFormat as the InputFormat and the text file as input. (3) Each mapper will process N directories. They move files in /tmp/account=*/year=YYYY/month=MM/day=DD into /somewhere/year=YYYY/month=MM/day=DD (create the dir if not exists) HTH Quanlong On Sat, Dec 15, 2018 at 3:36 PM Fawze Abujaber <fawz...@gmail.com> wrote: > Hi Community. > > I would like to create an external table on top of these hdfs files with > parition year, month and day, is there a possible to create one table on > top of these files > > /tmp/account=aaaa/year=2018/month=01/day=01 > /tmp/account=aaaa/year=2018/month=01/day=02 > /tmp/account=bbbb/year=2018/month=01/day=01 > /tmp/account=bbbb/year=2018/month=01/day=02 > > Creating a table with: > PARTITIONED BY ( > > year INT, > > month INT, > > day INT > > ) > > > > > STORED AS PARQUET > > > > LOCATION '/tmp' > > Is not working for me. > > Adding the account to the partition creatng me millions of partitions and > i want to avoid this, in the background i have a compaction job that > compact the small files under the parition day. > -- > Take Care > Fawze Abujaber >