instead of

using 'python2.6 user_id_output.py hbase'

try something like this:

using 'user_id_output.py'
... and a #! line with the location of the python binary.

I think you can include a parameter too in the call like :
using 'user_id_output.py hbase'

Cheers,
Ajo.

On Tue, Mar 1, 2011 at 8:22 AM, Irfan Mohammed <irfan...@gmail.com> wrote:

> Hi,
>
> I have a hive script [given below] which calls a python script using
> transform and for large datasets [ > 100M rows ], the reducer is not able to
> start the python process and the error message is "argument list too long".
> The detailed error stack is given below.
>
> The python script takes only 1 static argument "hbase". For small datasets
> [ 1M rows ], the script works fine.
>
>    1. Is this problem related to the number of open file handles on the
>    reducer box?
>    2. How do I get the correct error message?
>    3. Is there a way to get to the actual unix process with arguments it
>    is instantiating?
>
> Thanks,
> Irfan
>
> >>>>>>>>>>>>>> script >>>>>>>>>>>>>>>>>>>>>>>>>>>
>
> true && echo "
> set hive.exec.compress.output=true;
> set io.seqfile.compression.type=BLOCK;
> set mapred.output.compression.type=BLOCK;
> set
> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>
> set hive.merge.mapfiles=false;
> set hive.exec.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.max.dynamic.partitions=20000;
>
> set hive.map.aggr=true;
> set mapred.reduce.tasks=200;
>
> add file /home/irfan/scripts/user_id_output.py;
>
> from (
>   select
>     user_id,
>     source_id,
>     load_id,
>     user_id_json,
>     pid
>   from
>     user_id_bucket
>   where
>         1 = 1
>     and length(user_id) = 40
>   distribute by pid
>   sort by user_id, load_id desc
> ) T1
> insert overwrite table user_id_hbase_stg1 partition (pid)
> SELECT
>   transform
>   (
>       T1.user_id
>     , T1.source_id
>     , T1.load_id
>     , T1.user_id_json
>     , T1.pid
>   )
>   using 'python2.6 user_id_output.py hbase'
>   as user_id, user_id_info1, user_id_info2, user_id_info3, user_id_info4,
> user_id_info5, user_id_info3, pid
> ;
>
> " > ${SQL_FILE};
>
> true && ${HHIVE} -f ${SQL_FILE} 1>${TXT_FILE} 2>${LOG_FILE};
>
> >>>>>>>>>>>>>> error log >>>>>>>>>>>>>>>>>>>>>>>>>
>
> 2011-03-01 14:46:13,705 INFO
> org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing Self 5 OP
> 2011-03-01 14:46:13,711 INFO
> org.apache.hadoop.hive.ql.exec.ExtractOperator: Operator 5 OP initialized
> 2011-03-01 14:46:13,711 INFO
> org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing children of 5
> OP
> 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
> Initializing child 6 SEL
> 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
> Initializing Self 6 SEL
> 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
> SELECT
> struct<_col0:string,_col1:string,_col2:string,_col3:string,_col4:string>
> 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
> Operator 6 SEL initialized
> 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
> Initializing children of 6 SEL
> 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
> Initializing child 7 SCR
> 2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
> Initializing Self 7 SCR
> 2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
> Operator 7 SCR initialized
> 2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
> Initializing children of 7 SCR
> 2011-03-01 14:46:13,728 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 8 FS
> 2011-03-01 14:46:13,728 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 8 FS
> 2011-03-01 14:46:13,730 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 8 FS initialized
> 2011-03-01 14:46:13,730 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 8 FS
> 2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
> Initialization Done 7 SCR
> 2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
> Initialization Done 6 SEL
> 2011-03-01 14:46:13,730 INFO
> org.apache.hadoop.hive.ql.exec.ExtractOperator: Initialization Done 5 OP
> 2011-03-01 14:46:13,733 INFO ExecReducer: ExecReducer: processing 1 rows:
> used memory = 89690888
> 2011-03-01 14:46:13,733 INFO
> org.apache.hadoop.hive.ql.exec.ExtractOperator: 5 forwarding 1 rows
> 2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
> 6 forwarding 1 rows
> 2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
> Executing [/usr/bin/python2.6, user_id_output.py, hbase]
> 2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
> tablename=null
> 2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
> partname=null
> 2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
> alias=null
> 2011-03-01 14:46:13,777 FATAL ExecReducer:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing row
> (tag=0) 
> {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
> 'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
>  at
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
> at org.apache.hadoop.mapred.Child.main(Child.java:211)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
> initialize ScriptOperator
> at
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)
>  at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
>  at
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at
> org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
>  at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
>  ... 7 more
> Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
> java.io.IOException: error=7, Argument list too long
>  at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
> at
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
>  ... 15 more
> Caused by: java.io.IOException: java.io.IOException: error=7, Argument list
> too long
> at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>  at java.lang.ProcessImpl.start(ProcessImpl.java:65)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
>  ... 16 more
>
> 2011-03-01 14:46:13,779 WARN org.apache.hadoop.mapred.Child: Error running
> child
> java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing row
> (tag=0) 
> {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
> 'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
>  at
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:265)
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
> at org.apache.hadoop.mapred.Child.main(Child.java:211)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
> Error while processing row
> (tag=0) 
> {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
> 'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
>  at
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)
> ... 7 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
> initialize ScriptOperator
> at
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)
>  at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
>  at
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at
> org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
>  at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
>  ... 7 more
> Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
> java.io.IOException: error=7, Argument list too long
>  at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
> at
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
>  ... 15 more
> Caused by: java.io.IOException: java.io.IOException: error=7, Argument list
> too long
> at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>  at java.lang.ProcessImpl.start(ProcessImpl.java:65)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
>  ... 16 more
> 2011-03-01 14:46:13,784 INFO org.apache.hadoop.mapred.Task: Runnning
> cleanup for the task
>
>

Reply via email to