[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-04 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990955#comment-14990955
 ] 

Rui Li commented on HIVE-12229:
---

If spark.files.overwrite is false and the user overwrites an already added jar 
(different in file contents), then spark will throw exception on this.
Yes user can set this at hive just as other spark properties. However, if the 
user already launches an RSC and later changes spark.files.overwrite, then 
we'll launch another RSC because the SparkConf has changed. And accordingly 
we'll launch new executors for which no jar has been added. In such a case, I 
think it's uncertain which jar will be eventually added/used on the executor. 
Therefore it's better if the user set spark.files.overwrite in advance and 
doesn't change it throughout the session.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> 

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-04 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990788#comment-14990788
 ] 

Xuefu Zhang commented on HIVE-12229:


+1 for the patch. I have a followup question though. Based on your previous 
comment, what happens if spark.files.overwrite is false? Will user see an 
error? User should be able to set this property to true at Hive (say, beeline), 
right?

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-04 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991067#comment-14991067
 ] 

Xuefu Zhang commented on HIVE-12229:


Then the real question is: when the new RSC comes up, should we make all jars 
that have been added so far in the user session available to the new executors, 
including new jars that overwriting the previous ones?

It seems to be a usability issue if a user adds a jar, and then change some 
configuration (possibly irrelevant to add jar) that ends up with a new RSC, 
which results the new executors having no idea of the added jar?

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> 

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-04 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991017#comment-14991017
 ] 

Xuefu Zhang commented on HIVE-12229:


When the new executors are launched, shouldn't the added jars, which is 
available in the context be added to the classpath? On Hive side, we should be 
able to update the context in a deterministic manner, right?

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-04 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991060#comment-14991060
 ] 

Rui Li commented on HIVE-12229:
---

I'll use some examples to explain my point. Let's consider the following case:

1. User adds A.jar from hive and runs some query.
2. User adds(updates) a different A.jar and run some query but the query fails 
because spark.files.overwrite is false.
3. User sets spark.files.overwrite to true and run that failed query again.
But at this point, we'll create a new RSC as well as new executors. And these 
new executors have no jars added by now. Then we add the two A.jar to RSC and 
the second added A.jar will overwrite the first, and eventually get distributed 
to the executors. But IMO the order of the two A.jar is not deterministic 
because added jars are retrieved from {{Utilities.getResourceFiles}} and added 
by an async RPC job.

Now let's consider another case:

1. User sets spark.files.overwrite to true, adds A.jar and runs some queries.
2. User adds another A.jar and runs another query.
At this point, the former A.jar is already added so we won't add it again and 
just add the new A.jar to RSC. RSC ships the new A.jar to executors and 
successfully overwrites the existing A.jar. Then the query will use the updated 
jar, which is the expected behavior.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> 

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-04 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991142#comment-14991142
 ] 

Rui Li commented on HIVE-12229:
---

Since we don't know which jar is newer on hive side, we simply send all the 
added jars to the new RSC. And as I said, which jar is really used is not 
deterministic. If we want to solve this, we need a way to remove jars that have 
been overwritten, i.e. when user adds another A.jar, the previous one should be 
removed from the session (no longer returned by {{Utilities.getResourceFiles}}).

Regarding your second question, everything will be fine if user doesn't add 
multiple jars with the same name (i.e. overwriting older jars). All the already 
added jars will be shipped to the executors.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at 

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-04 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991155#comment-14991155
 ] 

Xuefu Zhang commented on HIVE-12229:


Okay. Thanks for all the explanation. I guess the same problem can happen to MR 
as well. That problem can be handled in a separate JIRA if ever.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-04 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991188#comment-14991188
 ] 

Rui Li commented on HIVE-12229:
---

OK. I'll commit this shortly.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989118#comment-14989118
 ] 

Hive QA commented on HIVE-12229:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12770529/HIVE-12229.3-spark.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9650 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.initializationError
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/992/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/992/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-992/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12770529 - PreCommit-HIVE-SPARK-Build

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at 

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-03 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988679#comment-14988679
 ] 

Szehon Ho commented on HIVE-12229:
--

Continuing the investigation and findings in HIVE-12230 to not pollute this 
JIRA.  There I am giving another try with a config fix.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-03 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988682#comment-14988682
 ] 

Szehon Ho commented on HIVE-12229:
--

typo: HIVE-12330

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-03 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988933#comment-14988933
 ] 

Szehon Ho commented on HIVE-12229:
--

Seems to work for now.. see HIVE-12330 for latest test result.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985328#comment-14985328
 ] 

Sergio Peña commented on HIVE-12229:


[~xuefuz] It is a different error caused by the MiniTez cluster. Below is the 
error exception that was thrown:

{noformat}
java.io.IOException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.hive.shims.Hadoop23Shims$MiniTezShim.createAndLaunchLlapDaemon(Hadoop23Shims.java:447)
at 
org.apache.hadoop.hive.shims.Hadoop23Shims$MiniTezShim.(Hadoop23Shims.java:402)
at 
org.apache.hadoop.hive.shims.Hadoop23Shims.getMiniTezCluster(Hadoop23Shims.java:379)
at 
org.apache.hadoop.hive.shims.Hadoop23Shims.getMiniTezCluster(Hadoop23Shims.java:116)
at org.apache.hadoop.hive.ql.QTestUtil.(QTestUtil.java:450)
at 
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.(TestMiniLlapCliDriver.java:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.internal.runners.SuiteMethod.testFromSuiteMethod(SuiteMethod.java:35)
{noformat}

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> 

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-02 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985244#comment-14985244
 ] 

Xuefu Zhang commented on HIVE-12229:


Hi [~szehon]/[~spena], could you please take a look to see if there is some 
problem with the env? At one point, it went away, but now it seems it has 
resurfaced. Thanks.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-02 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985517#comment-14985517
 ] 

Xuefu Zhang commented on HIVE-12229:


Thanks, [~spena]. Is the following the problem a consequence of that, or else?
{noformat}
TestSparkCliDriver-bucketmapjoin12.q-avro_decimal_native.q-udf_percentile.q-and-12-more
 - did not produce a TEST-*.xml file
{noformat}

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-01 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984850#comment-14984850
 ] 

Rui Li commented on HIVE-12229:
---

I still can't reproduce the failures. Seems we still have some issue of the 
test framework?

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-10-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981982#comment-14981982
 ] 

Hive QA commented on HIVE-12229:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769691/HIVE-12229.3-spark.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 9478 tests 
executed
*Failed tests:*
{noformat}
TestCliDriver-authorization_cli_nonsql.q-skewjoinopt19.q-tez_self_join.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-load_dyn_part5.q-notable_alias3.q-type_conversions_1.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-ptf_seqfile.q-constprog_dpp.q-udaf_number_format.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-smb_mapjoin_4.q-udf_asin.q-udf_to_unix_timestamp.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-udf_bitwise_and.q-bucketcontext_4.q-orc_ends_with_nulls.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-udf_current_user.q-join44.q-input41.q-and-12-more - did not 
produce a TEST-*.xml file
TestCliDriver-union2.q-exchange_partition2.q-udf_initcap.q-and-12-more - did 
not produce a TEST-*.xml file
TestMiniTezCliDriver-constprog_dpp.q-dynamic_partition_pruning.q-vectorization_10.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-mapjoin_decimal.q-vectorized_distinct_gby.q-union_fast_stats.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vectorization_13.q-auto_sortmerge_join_13.q-tez_bmj_schema_evolution.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vectorization_16.q-vector_decimal_round.q-orc_merge6.q-and-12-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-bucket_num_reducers.q-table_nonprintable.q-scriptfile1.q-and-1-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-groupby6_map.q-join13.q-join_reorder3.q-and-12-more - did 
not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-skewjoinopt3.q-union27.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-table_access_keys_stats.q-bucketsortoptimize_insert_4.q-join_rc.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/987/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/987/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-987/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769691 - PreCommit-HIVE-SPARK-Build

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.2-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-10-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982199#comment-14982199
 ] 

Hive QA commented on HIVE-12229:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769708/HIVE-12229.3-spark.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9635 tests executed
*Failed tests:*
{noformat}
TestSparkCliDriver-bucketmapjoin12.q-avro_decimal_native.q-udf_percentile.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.initializationError
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/988/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/988/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-988/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769708 - PreCommit-HIVE-SPARK-Build

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> 

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-10-29 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979944#comment-14979944
 ] 

Rui Li commented on HIVE-12229:
---

Thanks for your inputs, Xuefu.

1. Good catch. I'll make it work with local[*]. But we don't have to worry 
about local-cluster mode, because executors run in separate JVMs in 
local-cluster mode so the child process will inherit the proper working 
directory.
2. OK. Whether the new file can be added successfully on executor is up to 
spark, i.e. configured by {{spark.files.overwrite}}. Let's just overwrite on 
our side.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.2-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> 

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-10-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978296#comment-14978296
 ] 

Xuefu Zhang commented on HIVE-12229:


Hi [~lirui], thanks for fixing this. Two minor questions:

1. detecting Spark local mode, instead of equals() in {{ 
sparkConf.get("spark.master").equals("local")}}, should we use startWith(), to 
cover cases such as local[2] as well as local-cluster?

2. If user adds a file which exists remote, should we overwrite instead of 
throwing an exception? Maybe the user just wants to replace the file added 
previously.

What's your thought?

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.2-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> 

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-10-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977752#comment-14977752
 ] 

Hive QA commented on HIVE-12229:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769189/HIVE-12229.2-spark.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9462 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_inner_join
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/982/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/982/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-982/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769189 - PreCommit-HIVE-SPARK-Build

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.2-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> 

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-10-27 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1492#comment-1492
 ] 

Rui Li commented on HIVE-12229:
---

Latest failures are not related.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.2-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-10-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973873#comment-14973873
 ] 

Hive QA commented on HIVE-12229:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12768298/HIVE-12229.2-spark.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 6450 tests 
executed
*Failed tests:*
{noformat}
TestContribNegativeCliDriver - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestHBaseCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_inner_join
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMinimrCliDriver.initializationError
org.apache.hadoop.hive.cli.TestNegativeCliDriver.initializationError
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.initializationError
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/972/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/972/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-972/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12768298 - PreCommit-HIVE-SPARK-Build

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> 

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-10-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969596#comment-14969596
 ] 

Xuefu Zhang commented on HIVE-12229:


[~lirui], could you please check HIVE-12045 has the same root cause? Thanks.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-10-22 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970331#comment-14970331
 ] 

Rui Li commented on HIVE-12229:
---

I don't think it's the root cause for HIVE-12045. The UDF classes should be 
loaded properly as long as the jar is added to classpath. It doesn't matter if 
we rename the jar file.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)