[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990955#comment-14990955 ] Rui Li commented on HIVE-12229: --- If spark.files.overwrite is false and the user overwrites an already added jar (different in file contents), then spark will throw exception on this. Yes user can set this at hive just as other spark properties. However, if the user already launches an RSC and later changes spark.files.overwrite, then we'll launch another RSC because the SparkConf has changed. And accordingly we'll launch new executors for which no jar has been added. In such a case, I think it's uncertain which jar will be eventually added/used on the executor. Therefore it's better if the user set spark.files.overwrite in advance and doesn't change it throughout the session. > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at >
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990788#comment-14990788 ] Xuefu Zhang commented on HIVE-12229: +1 for the patch. I have a followup question though. Based on your previous comment, what happens if spark.files.overwrite is false? Will user see an error? User should be able to set this property to true at Hive (say, beeline), right? > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331) > ... 14 more > {noformat} -- This message was sent by Atlassian JIRA
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991067#comment-14991067 ] Xuefu Zhang commented on HIVE-12229: Then the real question is: when the new RSC comes up, should we make all jars that have been added so far in the user session available to the new executors, including new jars that overwriting the previous ones? It seems to be a usability issue if a user adds a jar, and then change some configuration (possibly irrelevant to add jar) that ends up with a new RSC, which results the new executors having no idea of the added jar? > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at >
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991017#comment-14991017 ] Xuefu Zhang commented on HIVE-12229: When the new executors are launched, shouldn't the added jars, which is available in the context be added to the classpath? On Hive side, we should be able to update the context in a deterministic manner, right? > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331) > ... 14 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991060#comment-14991060 ] Rui Li commented on HIVE-12229: --- I'll use some examples to explain my point. Let's consider the following case: 1. User adds A.jar from hive and runs some query. 2. User adds(updates) a different A.jar and run some query but the query fails because spark.files.overwrite is false. 3. User sets spark.files.overwrite to true and run that failed query again. But at this point, we'll create a new RSC as well as new executors. And these new executors have no jars added by now. Then we add the two A.jar to RSC and the second added A.jar will overwrite the first, and eventually get distributed to the executors. But IMO the order of the two A.jar is not deterministic because added jars are retrieved from {{Utilities.getResourceFiles}} and added by an async RPC job. Now let's consider another case: 1. User sets spark.files.overwrite to true, adds A.jar and runs some queries. 2. User adds another A.jar and runs another query. At this point, the former A.jar is already added so we won't add it again and just add the new A.jar to RSC. RSC ships the new A.jar to executors and successfully overwrites the existing A.jar. Then the query will use the updated jar, which is the expected behavior. > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at >
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991142#comment-14991142 ] Rui Li commented on HIVE-12229: --- Since we don't know which jar is newer on hive side, we simply send all the added jars to the new RSC. And as I said, which jar is really used is not deterministic. If we want to solve this, we need a way to remove jars that have been overwritten, i.e. when user adds another A.jar, the previous one should be removed from the session (no longer returned by {{Utilities.getResourceFiles}}). Regarding your second question, everything will be fine if user doesn't add multiple jars with the same name (i.e. overwriting older jars). All the already added jars will be shipped to the executors. > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991155#comment-14991155 ] Xuefu Zhang commented on HIVE-12229: Okay. Thanks for all the explanation. I guess the same problem can happen to MR as well. That problem can be handled in a separate JIRA if ever. > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331) > ... 14 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991188#comment-14991188 ] Rui Li commented on HIVE-12229: --- OK. I'll commit this shortly. > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331) > ... 14 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989118#comment-14989118 ] Hive QA commented on HIVE-12229: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12770529/HIVE-12229.3-spark.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9650 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.initializationError org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/992/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/992/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-992/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12770529 - PreCommit-HIVE-SPARK-Build > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988679#comment-14988679 ] Szehon Ho commented on HIVE-12229: -- Continuing the investigation and findings in HIVE-12230 to not pollute this JIRA. There I am giving another try with a config fix. > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331) > ... 14 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988682#comment-14988682 ] Szehon Ho commented on HIVE-12229: -- typo: HIVE-12330 > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331) > ... 14 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988933#comment-14988933 ] Szehon Ho commented on HIVE-12229: -- Seems to work for now.. see HIVE-12330 for latest test result. > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331) > ... 14 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985328#comment-14985328 ] Sergio Peña commented on HIVE-12229: [~xuefuz] It is a different error caused by the MiniTez cluster. Below is the error exception that was thrown: {noformat} java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.shims.Hadoop23Shims$MiniTezShim.createAndLaunchLlapDaemon(Hadoop23Shims.java:447) at org.apache.hadoop.hive.shims.Hadoop23Shims$MiniTezShim.(Hadoop23Shims.java:402) at org.apache.hadoop.hive.shims.Hadoop23Shims.getMiniTezCluster(Hadoop23Shims.java:379) at org.apache.hadoop.hive.shims.Hadoop23Shims.getMiniTezCluster(Hadoop23Shims.java:116) at org.apache.hadoop.hive.ql.QTestUtil.(QTestUtil.java:450) at org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.(TestMiniLlapCliDriver.java:54) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.internal.runners.SuiteMethod.testFromSuiteMethod(SuiteMethod.java:35) {noformat} > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985244#comment-14985244 ] Xuefu Zhang commented on HIVE-12229: Hi [~szehon]/[~spena], could you please take a look to see if there is some problem with the env? At one point, it went away, but now it seems it has resurfaced. Thanks. > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331) > ... 14 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985517#comment-14985517 ] Xuefu Zhang commented on HIVE-12229: Thanks, [~spena]. Is the following the problem a consequence of that, or else? {noformat} TestSparkCliDriver-bucketmapjoin12.q-avro_decimal_native.q-udf_percentile.q-and-12-more - did not produce a TEST-*.xml file {noformat} > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331) > ... 14 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984850#comment-14984850 ] Rui Li commented on HIVE-12229: --- I still can't reproduce the failures. Seems we still have some issue of the test framework? > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331) > ... 14 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981982#comment-14981982 ] Hive QA commented on HIVE-12229: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12769691/HIVE-12229.3-spark.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 9478 tests executed *Failed tests:* {noformat} TestCliDriver-authorization_cli_nonsql.q-skewjoinopt19.q-tez_self_join.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-load_dyn_part5.q-notable_alias3.q-type_conversions_1.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-ptf_seqfile.q-constprog_dpp.q-udaf_number_format.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-smb_mapjoin_4.q-udf_asin.q-udf_to_unix_timestamp.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-udf_bitwise_and.q-bucketcontext_4.q-orc_ends_with_nulls.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-udf_current_user.q-join44.q-input41.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-union2.q-exchange_partition2.q-udf_initcap.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-constprog_dpp.q-dynamic_partition_pruning.q-vectorization_10.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-mapjoin_decimal.q-vectorized_distinct_gby.q-union_fast_stats.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vectorization_13.q-auto_sortmerge_join_13.q-tez_bmj_schema_evolution.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vectorization_16.q-vector_decimal_round.q-orc_merge6.q-and-12-more - did not produce a TEST-*.xml file TestMinimrCliDriver-bucket_num_reducers.q-table_nonprintable.q-scriptfile1.q-and-1-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby6_map.q-join13.q-join_reorder3.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-skewjoinopt3.q-union27.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-table_access_keys_stats.q-bucketsortoptimize_insert_4.q-join_rc.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/987/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/987/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-987/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 17 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12769691 - PreCommit-HIVE-SPARK-Build > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.2-spark.patch, HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982199#comment-14982199 ] Hive QA commented on HIVE-12229: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12769708/HIVE-12229.3-spark.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9635 tests executed *Failed tests:* {noformat} TestSparkCliDriver-bucketmapjoin12.q-avro_decimal_native.q-udf_percentile.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.initializationError org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/988/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/988/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-988/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12769708 - PreCommit-HIVE-SPARK-Build > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.3-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at >
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979944#comment-14979944 ] Rui Li commented on HIVE-12229: --- Thanks for your inputs, Xuefu. 1. Good catch. I'll make it work with local[*]. But we don't have to worry about local-cluster mode, because executors run in separate JVMs in local-cluster mode so the child process will inherit the proper working directory. 2. OK. Whether the new file can be added successfully on executor is up to spark, i.e. configured by {{spark.files.overwrite}}. Let's just overwrite on our side. > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.2-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at >
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978296#comment-14978296 ] Xuefu Zhang commented on HIVE-12229: Hi [~lirui], thanks for fixing this. Two minor questions: 1. detecting Spark local mode, instead of equals() in {{ sparkConf.get("spark.master").equals("local")}}, should we use startWith(), to cover cases such as local[2] as well as local-cluster? 2. If user adds a file which exists remote, should we overwrite instead of throwing an exception? Maybe the user just wants to replace the file added previously. What's your thought? > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.2-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at >
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977752#comment-14977752 ] Hive QA commented on HIVE-12229: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12769189/HIVE-12229.2-spark.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9462 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_inner_join org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2 org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/982/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/982/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-982/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12769189 - PreCommit-HIVE-SPARK-Build > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.2-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at >
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1492#comment-1492 ] Rui Li commented on HIVE-12229: --- Latest failures are not related. > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, > HIVE-12229.2-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331) > ... 14 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973873#comment-14973873 ] Hive QA commented on HIVE-12229: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12768298/HIVE-12229.2-spark.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 6450 tests executed *Failed tests:* {noformat} TestContribNegativeCliDriver - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hadoop.hive.cli.TestHBaseCliDriver.initializationError org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_inner_join org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError org.apache.hadoop.hive.cli.TestMinimrCliDriver.initializationError org.apache.hadoop.hive.cli.TestNegativeCliDriver.initializationError org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.initializationError org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/972/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/972/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-972/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12768298 - PreCommit-HIVE-SPARK-Build > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at >
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969596#comment-14969596 ] Xuefu Zhang commented on HIVE-12229: [~lirui], could you please check HIVE-12045 has the same root cause? Thanks. > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331) > ... 14 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
[ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970331#comment-14970331 ] Rui Li commented on HIVE-12229: --- I don't think it's the root cause for HIVE-12045. The UDF classes should be loaded properly as long as the jar is added to classpath. It doesn't matter if we rename the jar file. > Custom script in query cannot be executed in yarn-cluster mode [Spark Branch]. > -- > > Key: HIVE-12229 > URL: https://issues.apache.org/jira/browse/HIVE-12229 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Lifeng Wang >Assignee: Rui Li > Attachments: HIVE-12229.1-spark.patch > > > Added one python script in the query and the python script cannot be found > during execution in yarn-cluster mode. > {noformat} > 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, > q2-sessionize.py, 3600] > 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null > 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used > memory = 324896224 > 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling > reporter.progress() > /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file > or directory > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done > 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done > 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used > memory = 325619920 > 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: > Stream closed > 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all > input data. This is considered as an error. > 15/10/21 21:10:55 INFO exec.ScriptOperator: set > hive.exec.script.allow.partial.consumption=true; to ignore it. > 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row > (tag=0) > {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: > An error occurred while reading or writing to your custom script. It may have > crashed with an error. > at > org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331) > ... 14 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)