[ https://issues.apache.org/jira/browse/AIRFLOW-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rahul Singh updated AIRFLOW-1496: --------------------------------- Description: Hi, I am trying to use druid hook to load data from hdfs to druid , below is my dag script : from datetime import datetime, timedelta import json from airflow.hooks import HttpHook, DruidHook from airflow.operators import PythonOperator from airflow.models import DAG def check_druid_con(): dr_hook = DruidHook(druid_ingest_conn_id='DRUID_INDEX',druid_query_conn_id='DRUID_QUERY') dr_hook.load_from_hdfs("druid_airflow","hdfs://10.xx.xx.xx/demanddata/demand2.tsv","stay_date",["channel","rate"],"2016-12-11/2017-12-13",1,-1,metric_spec=[{ "name" : "count", "type" : "count" }],hadoop_dependency_coordinates="org.apache.hadoop:hadoop-client:2.7.3") default_args = { 'owner': 'TC', 'start_date': datetime(2017, 8, 7), 'retries': 1, 'retry_delay': timedelta(minutes=5) } dag = DAG('druid_data_load', default_args=default_args) druid_task1=PythonOperator(task_id='check_druid', python_callable=check_druid_con, dag=dag) I keep getting error , TypeError: load_from_hdfs() takes at least 10 arguments (10 given) . However I have given 10 arguments to load_from_hdfs , still it errors out . Please help. Regards Rahul was: Hi, I am trying to use druid hook to load data from hdfs to druid , below is my dag script : from datetime import datetime, timedelta import json from airflow.hooks import HttpHook, DruidHook from airflow.operators import PythonOperator from airflow.models import DAG def check_druid_con(): dr_hook = DruidHook(druid_ingest_conn_id='DRUID_INDEX',druid_query_conn_id='DRUID_QUERY') dr_hook.load_from_hdfs("druid_airflow","hdfs://10.55.26.71/demanddata/demand2.tsv","stay_date",["channel","rate"],"2016-12-11/2017-12-13",1,-1,metric_spec=[{ "name" : "count", "type" : "count" }],hadoop_dependency_coordinates="org.apache.hadoop:hadoop-client:2.7.3") default_args = { 'owner': 'TC', 'start_date': datetime(2017, 8, 7), 'retries': 1, 'retry_delay': timedelta(minutes=5) } dag = DAG('druid_data_load', default_args=default_args) druid_task1=PythonOperator(task_id='check_druid', python_callable=check_druid_con, dag=dag) I keep getting error , TypeError: load_from_hdfs() takes at least 10 arguments (10 given) . However I have given 10 arguments to load_from_hdfs , still it errors out . Please help. Regards Rahul > Druid hook unable to load data from hdfs > ---------------------------------------- > > Key: AIRFLOW-1496 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1496 > Project: Apache Airflow > Issue Type: Bug > Components: hooks > Affects Versions: 1.8.0 > Environment: RHEL 6.7 , Python 2.7.13 > Reporter: Rahul Singh > > Hi, > I am trying to use druid hook to load data from hdfs to druid , below is my > dag script : > from datetime import datetime, timedelta > import json > from airflow.hooks import HttpHook, DruidHook > from airflow.operators import PythonOperator > from airflow.models import DAG > def check_druid_con(): > dr_hook = > DruidHook(druid_ingest_conn_id='DRUID_INDEX',druid_query_conn_id='DRUID_QUERY') > > dr_hook.load_from_hdfs("druid_airflow","hdfs://10.xx.xx.xx/demanddata/demand2.tsv","stay_date",["channel","rate"],"2016-12-11/2017-12-13",1,-1,metric_spec=[{ > "name" : "count", "type" : "count" > }],hadoop_dependency_coordinates="org.apache.hadoop:hadoop-client:2.7.3") > default_args = { > 'owner': 'TC', > 'start_date': datetime(2017, 8, 7), > 'retries': 1, > 'retry_delay': timedelta(minutes=5) > } > dag = DAG('druid_data_load', default_args=default_args) > druid_task1=PythonOperator(task_id='check_druid', > python_callable=check_druid_con, > dag=dag) > I keep getting error , TypeError: load_from_hdfs() takes at least 10 > arguments (10 given) . However I have given 10 arguments to load_from_hdfs , > still it errors out . Please help. > Regards > Rahul -- This message was sent by Atlassian JIRA (v6.4.14#64029)