Thanks for all the responses.

Once I renamed files within directories to have extensions .csv, then it 
worked. So looks like for csv format, having extension is a must. It would be 
nice, if it does not allow "null" in the extension description.

Now in the next step of my proof of concept, I am trying to access parquet 
files. I have parquet files(tables) created for the tables using impala, I am 
assuming that I should be able to access those files via drill as well.

My parquet tables are placed under /user/hive/warehouse, like listed below here


[root@rtr-poc-imp1 sample-data]# hdfs dfs -ls /user/hive/warehouse
Found 19 items
drwxrwxrwt   - impala hive          0 2015-03-31 16:00 
/user/hive/warehouse/dim_agent_status_parq
drwxrwxrwt   - impala hive          0 2015-03-31 16:00 
/user/hive/warehouse/dim_agent_status_reasons_parq
drwxrwxrwt   - impala hive          0 2015-03-27 12:27 
/user/hive/warehouse/dim_agents_parquet
drwxrwxrwt   - impala hive          0 2015-03-31 16:00 
/user/hive/warehouse/dim_call_action_reasons_parq
drwxrwxrwt   - impala hive          0 2015-03-31 14:09 
/user/hive/warehouse/dim_call_actions_parq
drwxrwxrwt   - impala hive          0 2015-03-31 13:54 
/user/hive/warehouse/dim_call_types_parq
drwxrwxrwt   - impala hive          0 2015-03-31 15:59 
/user/hive/warehouse/dim_dispositions_parq
drwxrwxrwt   - impala hive          0 2015-03-31 15:20 
/user/hive/warehouse/dim_resource_groups_parq
drwxrwxrwt   - impala hive          0 2015-03-31 13:33 
/user/hive/warehouse/dim_services_parq
drwxrwxrwt   - impala hive          0 2015-03-31 14:00 
/user/hive/warehouse/dim_sites_parq
drwxrwxrwt   - impala hive          0 2015-03-31 15:25 
/user/hive/warehouse/dim_workgroups_parq
drwxrwxrwx   - root   hive          0 2015-04-08 14:36 
/user/hive/warehouse/dservices
drwxrwxrwt   - impala hive          0 2015-03-27 11:48 
/user/hive/warehouse/edwpoc.db
drwxrwxrwt   - impala hive          0 2015-03-31 12:47 
/user/hive/warehouse/fact_agent_activity_detail_12m_partparq
drwxrwxrwt   - impala hive          0 2015-03-30 13:03 
/user/hive/warehouse/fact_contact_detail_12m_partparq
drwxrwxrwt   - impala hive          0 2015-03-27 13:36 
/user/hive/warehouse/fact_contact_detail_partparq
-rw-r--r--   3 root   hive        455 2015-04-08 14:55 
/user/hive/warehouse/region.parq
drwxrwxrwt   - impala hive          0 2015-03-25 22:29 
/user/hive/warehouse/sample_07
drwxrwxrwt   - impala hive          0 2015-03-25 22:29 
/user/hive/warehouse/sample_08

example listing from one of the directory

hdfs dfs -ls /user/hive/warehouse/dim_services_parq
Found 3 items
-rw-r--r--   3 impala hive      55121 2015-03-31 13:33 
/user/hive/warehouse/dim_services_parq/4645c4221dafa337-250888c6ac1de29b_1376355963_data.0.parq
-rw-r--r--   3 impala hive      71075 2015-03-31 13:33 
/user/hive/warehouse/dim_services_parq/4645c4221dafa337-250888c6ac1de29c_2123191845_data.0.parq
drwxrwxrwt   - impala hive          0 2015-03-31 13:33 
/user/hive/warehouse/dim_services_parq/_impala_insert_staging
[root@rtr-poc-imp1 sample-data]#

There is nothing under impala staging directory, this is primarily used when 
insert operation is performed.

I copied dim_services_parq directory to dservices and below is the listing of 
dservices directory.

[root@rtr-poc-imp1 sample-data]#  hdfs dfs -ls /user/hive/warehouse/dservices
Found 2 items
-rwxrwxrwx   3 root hive      55121 2015-04-08 14:12 
/user/hive/warehouse/dservices/service0.parquet
-rwxrwxrwx   3 root hive      71075 2015-04-08 14:12 
/user/hive/warehouse/dservices/service1.parquet

Now when I try, I get the below error

select * from hdfs.drillpoc.`/dservices`;
Query failed: RemoteRpcException: Failure while running fragment., 
java.lang.UnsupportedOperationException [ cfca83ec-986a-43c0-a967-5aee102401dd 
on rtr-poc-imp2.labs.aspect.com:31010 ]
[ cfca83ec-986a-43c0-a967-5aee102401dd on rtr-poc-imp2.labs.aspect.com:31010 ]

I also copied the drill sample parquet file region.parquet to the same location 
and that works fine like below.

select * from hdfs.drillpoc.`region.parq`;
+-------------+------------+------------+
| R_REGIONKEY |   R_NAME   | R_COMMENT  |
+-------------+------------+------------+
| 0           | AFRICA     | lar deposits. blithe |
| 1           | AMERICA    | hs use ironic, even  |
| 2           | ASIA       | ges. thinly even pin |
| 3           | EUROPE     | ly final courts cajo |
| 4           | MIDDLE EAST | uickly special accou |
+-------------+------------+------------+
5 rows selected (0.122 seconds)

So far what I have read, impala created parquet file should be like any other 
parquet file, there should not be a problem. If this does not work, I need to 
convert all my tables in text format to parquet format and access it with 
drill. Is there any utility to do that.

Thanks for all the help.
Latha







From: Sivasubramaniam, Latha
Sent: Wednesday, April 08, 2015 8:00 AM
To: '[email protected]'
Subject: RE: Unable to query data from hdfs

Hi,

Thanks for your responses. Even though I had done use hdfs, only when I fully 
qualified the file name it worked. But I am not able to access files without 
.csv extension.

I modified

"csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "delimiter": ","

To

"csv": {
      "type": "text",
      "extensions":  null,
      "delimiter": ","

And tried to access hdfs file 'DIM_Agents' and I get the same error. With null 
extensions, I can't access 'test.csv' also, once I reverted back csv format 
description then I could access test.csv again, but I cannot access other files 
with either of the format descriptions.

Below are what I tried. Is '_'  (underscore) a problem in the file name. All my 
hdfs files are in text format.

0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;
+------------+------------+
|  columns   |    dir0    |
+------------+------------+
| ["1","Latha"] | root       |
| ["2","Roshan"] | root       |
+------------+------------+
2 rows selected (0.276 seconds)
0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/DIM_Agents`;
Query failed: SqlValidatorException: Table 'hdfs.root./DIM_Agents' not found

Error: exception while executing query: Failure while executing query. 
(state=,code=0)
0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/DIM_Agents`;
Query failed: SqlValidatorException: Table 'hdfs.root./DIM_Agents' not found

Error: exception while executing query: Failure while executing query. 
(state=,code=0)
0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;
Query failed: SqlValidatorException: Table 'hdfs.root./test.csv' not found

Error: exception while executing query: Failure while executing query. 
(state=,code=0)
0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/DIM_Agents`;
Query failed: SqlValidatorException: Table 'hdfs.root./DIM_Agents' not found

Error: exception while executing query: Failure while executing query. 
(state=,code=0)
0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;
Query failed: SqlValidatorException: Table 'hdfs.root./test.csv' not found

Error: exception while executing query: Failure while executing query. 
(state=,code=0)
0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;
+------------+------------+
|  columns   |    dir0    |
+------------+------------+
| ["1","Latha"] | root       |
| ["2","Roshan"] | root       |
+------------+------------+
2 rows selected (0.112 seconds)

Appreciate your help.

Thanks,
Latha
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

Reply via email to