[GitHub] spark pull request #21783: [SPARK-24799]A solution of dealing with data skew...
GitHub user marymwu opened a pull request: https://github.com/apache/spark/pull/21783 [SPARK-24799]A solution of dealing with data skew in left,right,inner join ## What changes were proposed in this pull request? For the left,right,inner join statment execution, this solution is mainling about to devide the partions where the data skew has occured into serveral partions with smaller data scale, in order to parallelly execute more tasks to increase effeciency. ## How was this patch tested? Unit tests in DatasetSuite.scala You can merge this pull request into a Git repository by running: $ git pull https://github.com/marymwu/spark branch-2.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21783.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21783 commit 2a01c813b6ef7223a489a4bcda3c9e5feb899060 Author: wangsm9 Date: 2018-07-16T09:48:44Z âdata skew code for spark2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21759: sfas
Github user marymwu closed the pull request at: https://github.com/apache/spark/pull/21759 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21759: sfas
GitHub user marymwu opened a pull request: https://github.com/apache/spark/pull/21759 sfas ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marymwu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21759.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21759 commit dcf36ad54598118408c1425e81aa6552f42328c8 Author: Dongjoon Hyun Date: 2016-05-03T13:02:04Z [SPARK-15057][GRAPHX] Remove stale TODO comment for making `enum` in GraphGenerators This PR removes a stale TODO comment in `GraphGenerators.scala` Just comment removed. Author: Dongjoon Hyun Closes #12839 from dongjoon-hyun/SPARK-15057. (cherry picked from commit 46965cd014fd4ba68bdec15156ec9bcc27d9b217) Signed-off-by: Reynold Xin commit 1dc30f189ac30f070068ca5f60b7b4c85f2adc9e Author: Bryan Cutler Date: 2016-05-19T02:48:36Z [DOC][MINOR] ml.feature Scala and Python API sync I reviewed Scala and Python APIs for ml.feature and corrected discrepancies. Built docs locally, ran style checks Author: Bryan Cutler Closes #13159 from BryanCutler/ml.feature-api-sync. (cherry picked from commit b1bc5ebdd52ed12aea3fdc7b8f2fa2d00ea09c6b) Signed-off-by: Reynold Xin commit 642f00980f1de13a0f6d1dc8bc7ed5b0547f3a9d Author: Zheng RuiFeng Date: 2016-05-15T14:59:49Z [MINOR] Fix Typos 1,Rename matrix args in BreezeUtil to upper to match the doc 2,Fix several typos in ML and SQL manual tests Author: Zheng RuiFeng Closes #13078 from zhengruifeng/fix_ann. (cherry picked from commit c7efc56c7b6fc99c005b35c335716ff676856c6c) Signed-off-by: Reynold Xin commit 2126fb0c2b2bb8ac4c5338df15182fcf8713fb2f Author: Sandeep Singh Date: 2016-05-19T09:44:26Z [CORE][MINOR] Remove redundant set master in OutputCommitCoordinatorIntegrationSuite Remove redundant set master in OutputCommitCoordinatorIntegrationSuite, as we are already setting it in SparkContext below on line 43. existing tests Author: Sandeep Singh Closes #13168 from techaddict/minor-1. (cherry picked from commit 3facca5152e685d9c7da96bff5102169740a4a06) Signed-off-by: Reynold Xin commit 1fc0f95eb8abbb9cc8ede2139670e493e6939317 Author: Andrew Or Date: 2016-05-20T05:40:03Z [HOTFIX] Test compilation error from 52b967f commit dd0c7fb39cac44e8f0d73f9884fd1582c25e9cf4 Author: Reynold Xin Date: 2016-05-20T05:46:08Z Revert "[HOTFIX] Test compilation error from 52b967f" This reverts commit 1fc0f95eb8abbb9cc8ede2139670e493e6939317. commit f8d0177c31d43eab59a7535945f3dfa24e906273 Author: Davies Liu Date: 2016-05-18T23:02:52Z Revert "[SPARK-15392][SQL] fix default value of size estimation of logical plan" This reverts commit fc29b896dae08b957ed15fa681b46162600a4050. (cherry picked from commit 84b23453ddb0a97e3d81306de0a5dcb64f88bdd0) Signed-off-by: Reynold Xin commit 2ef645724a7f229309a87c5053b0fbdf45d06f52 Author: Takuya UESHIN Date: 2016-05-20T05:55:44Z [SPARK-15313][SQL] EmbedSerializerInFilter rule should keep exprIds of output of surrounded SerializeFromObject. ## What changes were proposed in this pull request? The following code: ``` val ds = Seq(("a", 1), ("b", 2), ("c", 3)).toDS() ds.filter(_._1 == "b").select(expr("_1").as[String]).foreach(println(_)) ``` throws an Exception: ``` org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: _1#420 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:50) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:87) ... Cause: java.lang.RuntimeException: Couldn't find _1#420 in [_1#416,_2#417] at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:94) at org.ap
[jira] [Created] (SPARK-24799) A solution of dealing with data skew in left,right,inner join
marymwu created SPARK-24799: --- Summary: A solution of dealing with data skew in left,right,inner join Key: SPARK-24799 URL: https://issues.apache.org/jira/browse/SPARK-24799 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.3.0, 2.2.0, 2.1.0, 2.0.0 Reporter: marymwu Fix For: 2.3.0 For the left,right,inner join statment execution, this solution is mainling about to devide the partions where the data skew has occured into serveral partions with smaller data scale, in order to parallelly execute more tasks to increase effeciency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17181) [Spark2.0 web ui]The status of the certain jobs is still displayed as running even if all the stages of this job have already finished
[ https://issues.apache.org/jira/browse/SPARK-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marymwu updated SPARK-17181: Attachment: job1000-2.png job1000-1.png > [Spark2.0 web ui]The status of the certain jobs is still displayed as running > even if all the stages of this job have already finished > --- > > Key: SPARK-17181 > URL: https://issues.apache.org/jira/browse/SPARK-17181 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: marymwu >Priority: Minor > Attachments: job1000-1.png, job1000-2.png > > > [Spark2.0 web ui]The status of the certain jobs is still displayed as running > even if all the stages of this job have already finished > Note: not sure what kind of jobs will encounter this problem > The following log shows that job 1000 has already been done, but on spark2.0 > web ui, the status of job 1000 is still displayed as running, see attached > file > 16/08/22 16:01:29 INFO DAGScheduler: dag send msg, result task done, job: 1000 > 16/08/22 16:01:29 INFO DAGScheduler: Job 1000 finished: run at > AccessController.java:-2, took 4.664319 s -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17181) [Spark2.0 web ui]The status of the certain jobs is still displayed as running even if all the stages of this job have already finished
marymwu created SPARK-17181: --- Summary: [Spark2.0 web ui]The status of the certain jobs is still displayed as running even if all the stages of this job have already finished Key: SPARK-17181 URL: https://issues.apache.org/jira/browse/SPARK-17181 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.0.0 Reporter: marymwu Priority: Minor [Spark2.0 web ui]The status of the certain jobs is still displayed as running even if all the stages of this job have already finished Note: not sure what kind of jobs will encounter this problem The following log shows that job 1000 has already been done, but on spark2.0 web ui, the status of job 1000 is still displayed as running, see attached file 16/08/22 16:01:29 INFO DAGScheduler: dag send msg, result task done, job: 1000 16/08/22 16:01:29 INFO DAGScheduler: Job 1000 finished: run at AccessController.java:-2, took 4.664319 s -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5770) Use addJar() to upload a new jar file to executor, it can't be added to classloader
[ https://issues.apache.org/jira/browse/SPARK-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430243#comment-15430243 ] marymwu commented on SPARK-5770: Hey, we have ran into the same issue too. We try to fix this but failed. Anybody can help on this issue, thank so much! > Use addJar() to upload a new jar file to executor, it can't be added to > classloader > --- > > Key: SPARK-5770 > URL: https://issues.apache.org/jira/browse/SPARK-5770 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: meiyoula >Priority: Minor > > First use addJar() to upload a jar to the executor, then change the jar > content and upload it again. We can see the jar file in the local has be > updated, but the classloader still load the old one. The executor log has no > error or exception to point it. > I use spark-shell to test it. And set "spark.files.overwrite" is true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16970) [spark2.0] spark2.0 doesn't catch the java exception thrown by reflect function in sql statement which causes the job abort
marymwu created SPARK-16970: --- Summary: [spark2.0] spark2.0 doesn't catch the java exception thrown by reflect function in sql statement which causes the job abort Key: SPARK-16970 URL: https://issues.apache.org/jira/browse/SPARK-16970 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: marymwu [spark2.0] spark2.0 doesn't catch the java exception thrown by reflect function in the sql statement which causes the job abort steps: 1. select reflect('java.net.URLDecoder','decode','%%E7','utf-8') test; -->"%%" which causes the java exception error: 16/08/09 15:56:38 INFO DAGScheduler: Job 1 failed: run at AccessController.java:-2, took 7.018147 s 16/08/09 15:56:38 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING, org.apache.spark.SparkException: Job aborted due to stage failure: Task 162 in stage 1.0 failed 8 times, most recent failure: Lost task 162.7 in stage 1.0 (TID 207, slave7.lenovomm2.com): org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:330) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:131) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:131) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection.eval(CallMethodViaReflection.scala:87) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:288) ... 8 more Caused by: java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "%E" at java.net.URLDecoder.decode(URLDecoder.java:192) ... 19 more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16833) [Spark2.0]when creating temporary function,command "add jar" doesn't work unless restart spark
marymwu created SPARK-16833: --- Summary: [Spark2.0]when creating temporary function,command "add jar" doesn't work unless restart spark Key: SPARK-16833 URL: https://issues.apache.org/jira/browse/SPARK-16833 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: marymwu [Spark2.0]when creating temporary function,command "add jar" doesn't work unless restart spark Steps: 1. add jar /tmp/GeoIP-0.6.8.jar; 2. create temporary function GeoIP2 as 'com.lenovo.lps.device.hive.udf.UDFGeoIP'; 3. select GeoIP2('tdy'); Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 527.0 failed 8 times, most recent failure: Lost task 0.7 in stage 527.0 (TID 140171, smokeslave2.avatar.lenovomm.com): java.lang.RuntimeException: Stream '/jars/GeoIP-0.6.8.jar'' was not found. Note: After restart spark,it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16601) Spark2.0 fail in creating table using sql statement "create table `db.tableName` xxx" while spark1.6 supports
[ https://issues.apache.org/jira/browse/SPARK-16601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401599#comment-15401599 ] marymwu commented on SPARK-16601: - Got it, thanks.BTW, some grammar changes in spark2.0 compared with spark1.6? > Spark2.0 fail in creating table using sql statement "create table > `db.tableName` xxx" while spark1.6 supports > - > > Key: SPARK-16601 > URL: https://issues.apache.org/jira/browse/SPARK-16601 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: marymwu >Priority: Minor > Attachments: error log.png > > > Spark2.0 fail in creating table using sql statement "create table > `db.tableName` xxx" while spark1.6 supports. > error log is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16603) Spark2.0 fail in executing the sql statement which field name begins with number,like "d.30_day_loss_user" while spark1.6 supports
[ https://issues.apache.org/jira/browse/SPARK-16603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401598#comment-15401598 ] marymwu commented on SPARK-16603: - ok,got it,thanks > Spark2.0 fail in executing the sql statement which field name begins with > number,like "d.30_day_loss_user" while spark1.6 supports > -- > > Key: SPARK-16603 > URL: https://issues.apache.org/jira/browse/SPARK-16603 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: marymwu >Priority: Minor > > Spark2.0 fail in executing the sql statement which field name begins with > number,like "d.30_day_loss_user" while spark1.6 supports > Error: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input > '.30' expecting > {')', ','} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16605) Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports
[ https://issues.apache.org/jira/browse/SPARK-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401597#comment-15401597 ] marymwu commented on SPARK-16605: - I see,thanks! > Spark2.0 cannot "select" data from a table stored as an orc file which has > been created by hive while hive or spark1.6 supports > --- > > Key: SPARK-16605 > URL: https://issues.apache.org/jira/browse/SPARK-16605 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: marymwu > Attachments: screenshot-1.png > > > Spark2.0 cannot "select" data from a table stored as an orc file which has > been created by hive while hive or spark1.6 supports > Steps: > 1. Use hive to create a table "tbtxt" stored as txt and load data into it. > 2. Use hive to create a table "tborc" stored as orc and insert the data from > table "tbtxt" . Example, "create table tborc stored as orc as select * from > tbtxt" > 3. Use spark2.0 to "select * from tborc;".-->error > occurs,java.lang.IllegalArgumentException: Field "nid" does not exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16601) Spark2.0 fail in creating table using sql statement "create table `db.tableName` xxx" while spark1.6 supports
[ https://issues.apache.org/jira/browse/SPARK-16601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383625#comment-15383625 ] marymwu commented on SPARK-16601: - I'd like to create a table in a named DB > Spark2.0 fail in creating table using sql statement "create table > `db.tableName` xxx" while spark1.6 supports > - > > Key: SPARK-16601 > URL: https://issues.apache.org/jira/browse/SPARK-16601 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 >Reporter: marymwu >Priority: Minor > Attachments: error log.png > > > Spark2.0 fail in creating table using sql statement "create table > `db.tableName` xxx" while spark1.6 supports. > error log is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16605) Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports
[ https://issues.apache.org/jira/browse/SPARK-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marymwu updated SPARK-16605: Attachment: screenshot-1.png > Spark2.0 cannot "select" data from a table stored as an orc file which has > been created by hive while hive or spark1.6 supports > --- > > Key: SPARK-16605 > URL: https://issues.apache.org/jira/browse/SPARK-16605 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: marymwu > Attachments: screenshot-1.png > > > Spark2.0 cannot "select" data from a table stored as an orc file which has > been created by hive while hive or spark1.6 supports > Steps: > 1. Use hive to create a table "tbtxt" stored as txt and load data into it. > 2. Use hive to create a table "tborc" stored as orc and insert the data from > table "tbtxt" . Example, "create table tborc stored as orc as select * from > tbtxt" > 3. Use spark2.0 to "select * from tborc;".-->error > occurs,java.lang.IllegalArgumentException: Field "nid" does not exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16605) Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports
marymwu created SPARK-16605: --- Summary: Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports Key: SPARK-16605 URL: https://issues.apache.org/jira/browse/SPARK-16605 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: marymwu Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports Steps: 1. Use hive to create a table "tbtxt" stored as txt and load data into it. 2. Use hive to create a table "tborc" stored as orc and insert the data from table "tbtxt" . Example, "create table tborc stored as orc as select * from tbtxt" 3. Use spark2.0 to "select * from tborc;".-->error occurs,java.lang.IllegalArgumentException: Field "nid" does not exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16604) Spark2.0 fail in executing the sql statement which includes partition field in the "select" statement while spark1.6 supports
marymwu created SPARK-16604: --- Summary: Spark2.0 fail in executing the sql statement which includes partition field in the "select" statement while spark1.6 supports Key: SPARK-16604 URL: https://issues.apache.org/jira/browse/SPARK-16604 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: marymwu Spark2.0 fail in executing the sql statement which includes partition field in the "select" statement error: 16/07/14 16:10:47 INFO HiveThriftServer2: set sessionId(69e92ba1-4be2-4be9-bc81-7a00c5802ef8) to exeId(c93f69b0-0f6e-4f07-afdc-ca6c41045fa3) 16/07/14 16:10:47 INFO SparkSqlParser: Parsing command: INSERT OVERWRITE TABLE d_avatar.RPS__H_REPORT_MORE_DIMENSION_MORE_NORM_FIRST_CHANNEL_VCD_IMPALA PARTITION(p_event_date='2016-07-13') select app_key, app_version, app_channel, device_model, total_num, new_num, active_num, extant_num, visits_num, start_num, p_event_date from RPS__H_REPORT_MORE_DIMENSION_MORE_NORM_FIRST_CHANNEL_VCD where p_event_date = '2016-07-13' 16/07/14 16:10:47 INFO ThriftHttpServlet: Could not validate cookie sent, will try to generate a new cookie 16/07/14 16:10:47 INFO ThriftHttpServlet: Cookie added for clientUserName hive 16/07/14 16:10:47 INFO HiveMetaStore: 108: get_table : db=default tbl=rps__h_report_more_dimension_more_norm_first_channel_vcd 16/07/14 16:10:47 INFO audit: ugi=u_reaper ip=unknown-ip-addr cmd=get_table : db=default tbl=rps__h_report_more_dimension_more_norm_first_channel_vcd 16/07/14 16:10:47 INFO HiveMetaStore: 108: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 16/07/14 16:10:47 INFO ObjectStore: ObjectStore, initialize called 16/07/14 16:10:47 INFO ThriftHttpServlet: Could not validate cookie sent, will try to generate a new cookie 16/07/14 16:10:47 INFO ThriftHttpServlet: Cookie added for clientUserName hive 16/07/14 16:10:47 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing 16/07/14 16:10:47 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL 16/07/14 16:10:47 INFO ObjectStore: Initialized ObjectStore 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint 16/07/14 16:10:47 INFO HiveMetaStore: 108: get_table : db=d_avatar tbl=rps__h_report_more_dimension_more_norm_first_channel_vcd_impala 16/07/14 16:10:47 INFO audit: ugi=u_reaper ip=unknown-ip-addr cmd=get_table : db=d_avatar tbl=rps__h_report_more_dimension_more_norm_first_channel_vcd_impala 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: string 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint 16/07/14 16:10:47 INFO CatalystSqlParser: Parsing command: bigint 16/07/14 16:10:49 WARN HiveSessionState$$anon$1: Max iterations (100) reached for batch Resolution 16/07/14 16:10:49 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING, org.apache.spark.sql.AnalysisException: unresolved operator 'InsertIntoTable MetastoreRelation d_avatar, rps__h_report_more_dimension_more_norm_first_channel_vcd_impala, None, Map(p_event_date -> Some(2016-07-13)), true, false; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:56) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:309) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:51)
[jira] [Created] (SPARK-16603) Spark2.0 fail in executing the sql statement which field name begins with number,like "d.30_day_loss_user" while spark1.6 supports
marymwu created SPARK-16603: --- Summary: Spark2.0 fail in executing the sql statement which field name begins with number,like "d.30_day_loss_user" while spark1.6 supports Key: SPARK-16603 URL: https://issues.apache.org/jira/browse/SPARK-16603 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: marymwu Priority: Minor Spark2.0 fail in executing the sql statement which field name begins with number,like "d.30_day_loss_user" while spark1.6 supports Error: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '.30' expecting {')', ','} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16602) Spark2.0-error occurs when execute the sql statement which includes "nvl" function while spark1.6 supports
marymwu created SPARK-16602: --- Summary: Spark2.0-error occurs when execute the sql statement which includes "nvl" function while spark1.6 supports Key: SPARK-16602 URL: https://issues.apache.org/jira/browse/SPARK-16602 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: marymwu Spark2.0-error occurs when execute the sql statement which includes "nvl" function while spark1.6 supports Error: org.apache.spark.sql.AnalysisException: cannot resolve 'nvl(b.`new_user`, 0)' due to data type mismatch: input to function coalesce should all be the same type, but it's [string, int]; line 2 pos 73 (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16601) Spark2.0 fail in creating table using sql statement "create table `db.tableName` xxx" while spark1.6 supports
[ https://issues.apache.org/jira/browse/SPARK-16601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marymwu updated SPARK-16601: Attachment: error log.png > Spark2.0 fail in creating table using sql statement "create table > `db.tableName` xxx" while spark1.6 supports > - > > Key: SPARK-16601 > URL: https://issues.apache.org/jira/browse/SPARK-16601 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: marymwu >Priority: Minor > Attachments: error log.png > > > Spark2.0 fail in creating table using sql statement "create table > `db.tableName` xxx" while spark1.6 supports. > error log is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16601) Spark2.0 fail in creating table using sql statement "create table `db.tableName` xxx" while spark1.6 supports
marymwu created SPARK-16601: --- Summary: Spark2.0 fail in creating table using sql statement "create table `db.tableName` xxx" while spark1.6 supports Key: SPARK-16601 URL: https://issues.apache.org/jira/browse/SPARK-16601 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: marymwu Priority: Minor Spark2.0 fail in creating table using sql statement "create table `db.tableName` xxx" while spark1.6 supports. error log is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (HIVE-14160) Reduce-task costs a long time to finish on the condition that the certain sql "select a,distinct(b) group by a" has been executed on the data which has skew distribution
marymwu created HIVE-14160: -- Summary: Reduce-task costs a long time to finish on the condition that the certain sql "select a,distinct(b) group by a" has been executed on the data which has skew distribution Key: HIVE-14160 URL: https://issues.apache.org/jira/browse/HIVE-14160 Project: Hive Issue Type: Improvement Components: hpl/sql Affects Versions: 1.1.0 Reporter: marymwu Reduce-task costs a long time to finish on the condition that the certain sql "select a,distinct(b) group by a" has been executed on the data which has skew distribution data scale: 64G -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SPARK-16376) [Spark web UI]:HTTP ERROR 500 when using rest api "/applications/[app-id]/jobs" if array "stageIds" is empty
marymwu created SPARK-16376: --- Summary: [Spark web UI]:HTTP ERROR 500 when using rest api "/applications/[app-id]/jobs" if array "stageIds" is empty Key: SPARK-16376 URL: https://issues.apache.org/jira/browse/SPARK-16376 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.0.0 Reporter: marymwu [Spark web UI]:HTTP ERROR 500 when using rest api "/applications/[app-id]/jobs" if array "stageIds" is empty See attachment for reference. HTTP ERROR 500 Problem accessing /api/v1/applications/application_1466239933301_175531/jobs. Reason: Server Error Caused by: java.lang.UnsupportedOperationException: empty.max at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:216) at scala.collection.AbstractTraversable.max(Traversable.scala:105) at org.apache.spark.status.api.v1.AllJobsResource$.convertJobData(AllJobsResource.scala:71) at org.apache.spark.status.api.v1.AllJobsResource$$anonfun$2$$anonfun$apply$2.apply(AllJobsResource.scala:46) at org.apache.spark.status.api.v1.AllJobsResource$$anonfun$2$$anonfun$apply$2.apply(AllJobsResource.scala:44) at scala.collection.TraversableLike$WithFilter$$anonfun$map$2.apply(TraversableLike.scala:722) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:721) at org.apache.spark.status.api.v1.AllJobsResource$$anonfun$2.apply(AllJobsResource.scala:44) at org.apache.spark.status.api.v1.AllJobsResource$$anonfun$2.apply(AllJobsResource.scala:43) at scala.collection.TraversableLike$WithFilter$$anonfun$flatMap$2.apply(TraversableLike.scala:753) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$WithFilter.flatMap(TraversableLike.scala:752) at org.apache.spark.status.api.v1.AllJobsResource.jobsList(AllJobsResource.scala:43) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.SubLocatorRule.accept(SubLocatorRule.java:134) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.spark-project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1496) at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164) at org.spark-project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499) at org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.spark-project.jetty.servlet.ServletHandler.doScope(Servle
[jira] [Updated] (SPARK-16375) [Spark web UI]:The wrong value(numCompletedTasks) has been assigned to the variable numSkippedTasks
[ https://issues.apache.org/jira/browse/SPARK-16375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marymwu updated SPARK-16375: Attachment: numSkippedTasksWrongValue.png > [Spark web UI]:The wrong value(numCompletedTasks) has been assigned to the > variable numSkippedTasks > --- > > Key: SPARK-16375 > URL: https://issues.apache.org/jira/browse/SPARK-16375 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: marymwu > Attachments: numSkippedTasksWrongValue.png > > > [Spark web UI]:The wrong value(numCompletedTasks) has been assigned to the > variable numSkippedTasks > See attachment for reference. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16375) [Spark web UI]:The wrong value(numCompletedTasks) has been assigned to the variable numSkippedTasks
marymwu created SPARK-16375: --- Summary: [Spark web UI]:The wrong value(numCompletedTasks) has been assigned to the variable numSkippedTasks Key: SPARK-16375 URL: https://issues.apache.org/jira/browse/SPARK-16375 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.0.0 Reporter: marymwu [Spark web UI]:The wrong value(numCompletedTasks) has been assigned to the variable numSkippedTasks See attachment for reference. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16089) Spark2.0 doesn't support the certain static partition SQL statment as "insert overwrite table targetTB PARTITION (partition field=xx) select field1,field2,...,partition
[ https://issues.apache.org/jira/browse/SPARK-16089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marymwu updated SPARK-16089: Component/s: SQL > Spark2.0 doesn't support the certain static partition SQL statment as "insert > overwrite table targetTB PARTITION (partition field=xx) select > field1,field2,...,partition field from sourceTB where partition field=xx" > while Spark 1.6 supports > --- > > Key: SPARK-16089 > URL: https://issues.apache.org/jira/browse/SPARK-16089 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: marymwu >Priority: Minor > Attachments: StaticPartitionSQLStatementError.png > > > Spark2.0 doesn't support the certain static partition SQL statment as "insert > overwrite table targetTB PARTITION (partition field=xx) select > field1,field2,...,partition field from sourceTB where partition field=xx" > while Spark 1.6 supports. > Testcase: > "insert overwrite table d_test_tpc_2g_txt.marytest1 PARTITION > (dt='2016-06-21') select nid, price, dt from d_test_tpc_2g_txt.marytest where > dt = '2016-06-21';" > Error: org.apache.spark.sql.AnalysisException: unresolved operator > 'InsertIntoTable MetastoreRelation d_test_tpc_2g_txt, marytest1, None, Map(dt > -> Some(2016-06-21)), true, false; (state=,code=0) > see attachment for reference. > Note: > The same SQL statement succeeded in Spark 1.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16092) Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark2.0 configuration file while Spark1.6 does
[ https://issues.apache.org/jira/browse/SPARK-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marymwu updated SPARK-16092: Component/s: SQL > Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict > as a global variable in Spark2.0 configuration file while Spark1.6 does > > > Key: SPARK-16092 > URL: https://issues.apache.org/jira/browse/SPARK-16092 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: marymwu > > Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict > as a global variable in Spark2.0 configuration file while Spark1.6 does > Precondition: > set hive.exec.dynamic.partition.mode=nonstrict as a global variable in > Spark2.0 configuration file > " > hive.exec.dynamic.partition.mode > nonstrict > " > Testcase: > "insert overwrite table d_test_tpc_2g_txt.marytest1 partition (dt) select > t.nid, t.price, t.dt from (select nid, price, dt from > d_test_tpc_2g_txt.marytest where dt >= '2016-06-20' and dt <= '2016-06-21') t > group by t.nid, t.price, t.dt;" > Result: > Error: org.apache.spark.SparkException: Dynamic partition strict mode > requires at least one static partition column. To turn this off set > hive.exec.dynamic.partition.mode=nonstrict (state=,code=0) > Note: > Spark1.6 supports the above SQL statement after set > hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark > configuration file > " > hive.exec.dynamic.partition.mode > nonstrict > " -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16093) Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1
[ https://issues.apache.org/jira/browse/SPARK-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marymwu updated SPARK-16093: Component/s: SQL > Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1 > -- > > Key: SPARK-16093 > URL: https://issues.apache.org/jira/browse/SPARK-16093 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: marymwu > Attachments: Errorlog.txt > > > Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1 > Precondition: > set spark.sql.autoBroadcastJoinThreshold = 1; > Testcase: > "INSERT OVERWRITE TABLE > RPS__H_REPORT_MORE_DIMENSION_FIRST_CHANNEL_VISIT_CD_DAY PARTITION > (p_event_date='2016-06-18') > select a.app_key,a.app_channel,b.device_model,sum(b.visits) visitsNum from > (select app_key,app_channel,lps_did from > RPS__H_REPORT_MORE_DIMENSION_EARLIEST_NEWUSER_LIST_C ) a > join > (select app_key,lps_did,device_model, count(1) as visits from > RPS__H_REPORT_MORE_DIMENSION_SMALL where p_event_date = '2016-06-18' > and ( log_type=1 or log_type=2) > group by app_key,lps_did,device_model) b > on a.lps_did = b.lps_did and a.app_key=b.app_key > group by a.app_key,a.app_channel,b.device_model; > " > == Physical Plan == > InsertIntoHiveTable MetastoreRelation default, > rps__h_report_more_dimension_first_channel_visit_cd_day, None, > Map(p_event_date -> Some(2016-06-18)), true, false > +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], > functions=[(sum(visits#3L),mode=Final,isDistinct=false)], > output=[app_key#7,app_channel#9,device_model#20,visitsNum#4L]) >+- Exchange(coordinator id: 41547585) hashpartitioning(app_key#7, > app_channel#9, device_model#20, 600), Some(coordinator[target post-shuffle > partition size: 5]) > +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], > functions=[(sum(visits#3L),mode=Partial,isDistinct=false)], > output=[app_key#7,app_channel#9,device_model#20,sum#41L]) > +- Project [app_key#7,app_channel#9,device_model#20,visits#3L] > +- BroadcastHashJoin [lps_did#8,app_key#7], > [lps_did#13,app_key#12], Inner, BuildRight, None >:- Filter (isnotnull(app_key#7) && isnotnull(lps_did#8)) >: +- HiveTableScan [app_key#7,app_channel#9,lps_did#8], > MetastoreRelation default, > rps__h_report_more_dimension_earliest_newuser_list_c, None >+- BroadcastExchange HashedRelationBroadcastMode(List(input[1, > string], input[0, string])) > +- > TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], > functions=[(count(1),mode=Final,isDistinct=false)], > output=[app_key#12,lps_did#13,device_model#20,visits#3L]) > +- Exchange(coordinator id: 733045095) > hashpartitioning(app_key#12, lps_did#13, device_model#20, 600), > Some(coordinator[target post-shuffle partition size: 5]) > +- > TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], > functions=[(count(1),mode=Partial,isDistinct=false)], > output=[app_key#12,lps_did#13,device_model#20,count#39L]) >+- Project [app_key#12,lps_did#13,device_model#20] > +- Filter ((isnotnull(app_key#12) && > isnotnull(lps_did#13)) && ((cast(log_type#11 as int) = 1) || > (cast(log_type#11 as int) = 2))) > +- HiveTableScan > [app_key#12,lps_did#13,device_model#20,log_type#11], MetastoreRelation > default, rps__h_report_more_dimension_small, None, > [isnotnull(p_event_date#10),(p_event_date#10 = 2016-06-18)] > Time taken: 4.775 seconds, Fetched 1 row(s) > 16/06/20 16:55:16 INFO CliDriver: Time taken: 4.775 seconds, Fetched 1 row(s) > Note: +- BroadcastHashJoin [lps_did#8,app_key#7], [lps_did#13,app_key#12], > Inner, BuildRight, None > Result: > 1. Execution failed, spark service is unavailable. > 2. Even though set spark.sql.autoBroadcastJoinThreshold = 1, > BroadcastHashJoin has been used when join two large tables. > Error log is as attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16255) Spark2.0 doesn't support the following SQL statement:"insert into directory "/u_qa_user/hive_testdata/test1/t1" select * from d_test_tpc_2g_txt.auction" while Hive suppo
marymwu created SPARK-16255: --- Summary: Spark2.0 doesn't support the following SQL statement:"insert into directory "/u_qa_user/hive_testdata/test1/t1" select * from d_test_tpc_2g_txt.auction" while Hive supports Key: SPARK-16255 URL: https://issues.apache.org/jira/browse/SPARK-16255 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: marymwu Spark2.0 doesn't support the following SQL statement:"insert into directory "/u_qa_user/hive_testdata/test1/t1" select * from d_test_tpc_2g_txt.auction" while Hive supports -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16254) Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is more than Total
[ https://issues.apache.org/jira/browse/SPARK-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marymwu updated SPARK-16254: Affects Version/s: 2.0.0 > Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is > more than Total > -- > > Key: SPARK-16254 > URL: https://issues.apache.org/jira/browse/SPARK-16254 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: marymwu >Priority: Minor > Attachments: Reference.png > > > Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is > more than Total > See attachment -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16254) Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is more than Total
[ https://issues.apache.org/jira/browse/SPARK-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marymwu updated SPARK-16254: Attachment: Reference.png > Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is > more than Total > -- > > Key: SPARK-16254 > URL: https://issues.apache.org/jira/browse/SPARK-16254 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: marymwu >Priority: Minor > Attachments: Reference.png > > > Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is > more than Total > See attachment -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16254) Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is more than Total
marymwu created SPARK-16254: --- Summary: Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is more than Total Key: SPARK-16254 URL: https://issues.apache.org/jira/browse/SPARK-16254 Project: Spark Issue Type: Bug Components: Web UI Reporter: marymwu Priority: Minor Spark2.0 monitor web ui->Tasks (for all stages)->the number of Succeed is more than Total See attachment -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16092) Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark2.0 configuration file while Spark1.6 does
[ https://issues.apache.org/jira/browse/SPARK-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350764#comment-15350764 ] marymwu commented on SPARK-16092: - Any body help? > Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict > as a global variable in Spark2.0 configuration file while Spark1.6 does > > > Key: SPARK-16092 > URL: https://issues.apache.org/jira/browse/SPARK-16092 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: marymwu > > Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict > as a global variable in Spark2.0 configuration file while Spark1.6 does > Precondition: > set hive.exec.dynamic.partition.mode=nonstrict as a global variable in > Spark2.0 configuration file > " > hive.exec.dynamic.partition.mode > nonstrict > " > Testcase: > "insert overwrite table d_test_tpc_2g_txt.marytest1 partition (dt) select > t.nid, t.price, t.dt from (select nid, price, dt from > d_test_tpc_2g_txt.marytest where dt >= '2016-06-20' and dt <= '2016-06-21') t > group by t.nid, t.price, t.dt;" > Result: > Error: org.apache.spark.SparkException: Dynamic partition strict mode > requires at least one static partition column. To turn this off set > hive.exec.dynamic.partition.mode=nonstrict (state=,code=0) > Note: > Spark1.6 supports the above SQL statement after set > hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark > configuration file > " > hive.exec.dynamic.partition.mode > nonstrict > " -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16089) Spark2.0 doesn't support the certain static partition SQL statment as "insert overwrite table targetTB PARTITION (partition field=xx) select field1,field2,...,partitio
[ https://issues.apache.org/jira/browse/SPARK-16089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350765#comment-15350765 ] marymwu commented on SPARK-16089: - Any body help? > Spark2.0 doesn't support the certain static partition SQL statment as "insert > overwrite table targetTB PARTITION (partition field=xx) select > field1,field2,...,partition field from sourceTB where partition field=xx" > while Spark 1.6 supports > --- > > Key: SPARK-16089 > URL: https://issues.apache.org/jira/browse/SPARK-16089 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: marymwu >Priority: Minor > Attachments: StaticPartitionSQLStatementError.png > > > Spark2.0 doesn't support the certain static partition SQL statment as "insert > overwrite table targetTB PARTITION (partition field=xx) select > field1,field2,...,partition field from sourceTB where partition field=xx" > while Spark 1.6 supports. > Testcase: > "insert overwrite table d_test_tpc_2g_txt.marytest1 PARTITION > (dt='2016-06-21') select nid, price, dt from d_test_tpc_2g_txt.marytest where > dt = '2016-06-21';" > Error: org.apache.spark.sql.AnalysisException: unresolved operator > 'InsertIntoTable MetastoreRelation d_test_tpc_2g_txt, marytest1, None, Map(dt > -> Some(2016-06-21)), true, false; (state=,code=0) > see attachment for reference. > Note: > The same SQL statement succeeded in Spark 1.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16093) Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1
[ https://issues.apache.org/jira/browse/SPARK-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350763#comment-15350763 ] marymwu commented on SPARK-16093: - Any body help? > Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1 > -- > > Key: SPARK-16093 > URL: https://issues.apache.org/jira/browse/SPARK-16093 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: marymwu > Attachments: Errorlog.txt > > > Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1 > Precondition: > set spark.sql.autoBroadcastJoinThreshold = 1; > Testcase: > "INSERT OVERWRITE TABLE > RPS__H_REPORT_MORE_DIMENSION_FIRST_CHANNEL_VISIT_CD_DAY PARTITION > (p_event_date='2016-06-18') > select a.app_key,a.app_channel,b.device_model,sum(b.visits) visitsNum from > (select app_key,app_channel,lps_did from > RPS__H_REPORT_MORE_DIMENSION_EARLIEST_NEWUSER_LIST_C ) a > join > (select app_key,lps_did,device_model, count(1) as visits from > RPS__H_REPORT_MORE_DIMENSION_SMALL where p_event_date = '2016-06-18' > and ( log_type=1 or log_type=2) > group by app_key,lps_did,device_model) b > on a.lps_did = b.lps_did and a.app_key=b.app_key > group by a.app_key,a.app_channel,b.device_model; > " > == Physical Plan == > InsertIntoHiveTable MetastoreRelation default, > rps__h_report_more_dimension_first_channel_visit_cd_day, None, > Map(p_event_date -> Some(2016-06-18)), true, false > +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], > functions=[(sum(visits#3L),mode=Final,isDistinct=false)], > output=[app_key#7,app_channel#9,device_model#20,visitsNum#4L]) >+- Exchange(coordinator id: 41547585) hashpartitioning(app_key#7, > app_channel#9, device_model#20, 600), Some(coordinator[target post-shuffle > partition size: 5]) > +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], > functions=[(sum(visits#3L),mode=Partial,isDistinct=false)], > output=[app_key#7,app_channel#9,device_model#20,sum#41L]) > +- Project [app_key#7,app_channel#9,device_model#20,visits#3L] > +- BroadcastHashJoin [lps_did#8,app_key#7], > [lps_did#13,app_key#12], Inner, BuildRight, None >:- Filter (isnotnull(app_key#7) && isnotnull(lps_did#8)) >: +- HiveTableScan [app_key#7,app_channel#9,lps_did#8], > MetastoreRelation default, > rps__h_report_more_dimension_earliest_newuser_list_c, None >+- BroadcastExchange HashedRelationBroadcastMode(List(input[1, > string], input[0, string])) > +- > TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], > functions=[(count(1),mode=Final,isDistinct=false)], > output=[app_key#12,lps_did#13,device_model#20,visits#3L]) > +- Exchange(coordinator id: 733045095) > hashpartitioning(app_key#12, lps_did#13, device_model#20, 600), > Some(coordinator[target post-shuffle partition size: 5]) > +- > TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], > functions=[(count(1),mode=Partial,isDistinct=false)], > output=[app_key#12,lps_did#13,device_model#20,count#39L]) >+- Project [app_key#12,lps_did#13,device_model#20] > +- Filter ((isnotnull(app_key#12) && > isnotnull(lps_did#13)) && ((cast(log_type#11 as int) = 1) || > (cast(log_type#11 as int) = 2))) > +- HiveTableScan > [app_key#12,lps_did#13,device_model#20,log_type#11], MetastoreRelation > default, rps__h_report_more_dimension_small, None, > [isnotnull(p_event_date#10),(p_event_date#10 = 2016-06-18)] > Time taken: 4.775 seconds, Fetched 1 row(s) > 16/06/20 16:55:16 INFO CliDriver: Time taken: 4.775 seconds, Fetched 1 row(s) > Note: +- BroadcastHashJoin [lps_did#8,app_key#7], [lps_did#13,app_key#12], > Inner, BuildRight, None > Result: > 1. Execution failed, spark service is unavailable. > 2. Even though set spark.sql.autoBroadcastJoinThreshold = 1, > BroadcastHashJoin has been used when join two large tables. > Error log is as attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16093) Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1
[ https://issues.apache.org/jira/browse/SPARK-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marymwu updated SPARK-16093: Attachment: Errorlog.txt > Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1 > -- > > Key: SPARK-16093 > URL: https://issues.apache.org/jira/browse/SPARK-16093 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 > Reporter: marymwu > Attachments: Errorlog.txt > > > Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1 > Precondition: > set spark.sql.autoBroadcastJoinThreshold = 1; > Testcase: > "INSERT OVERWRITE TABLE > RPS__H_REPORT_MORE_DIMENSION_FIRST_CHANNEL_VISIT_CD_DAY PARTITION > (p_event_date='2016-06-18') > select a.app_key,a.app_channel,b.device_model,sum(b.visits) visitsNum from > (select app_key,app_channel,lps_did from > RPS__H_REPORT_MORE_DIMENSION_EARLIEST_NEWUSER_LIST_C ) a > join > (select app_key,lps_did,device_model, count(1) as visits from > RPS__H_REPORT_MORE_DIMENSION_SMALL where p_event_date = '2016-06-18' > and ( log_type=1 or log_type=2) > group by app_key,lps_did,device_model) b > on a.lps_did = b.lps_did and a.app_key=b.app_key > group by a.app_key,a.app_channel,b.device_model; > " > == Physical Plan == > InsertIntoHiveTable MetastoreRelation default, > rps__h_report_more_dimension_first_channel_visit_cd_day, None, > Map(p_event_date -> Some(2016-06-18)), true, false > +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], > functions=[(sum(visits#3L),mode=Final,isDistinct=false)], > output=[app_key#7,app_channel#9,device_model#20,visitsNum#4L]) >+- Exchange(coordinator id: 41547585) hashpartitioning(app_key#7, > app_channel#9, device_model#20, 600), Some(coordinator[target post-shuffle > partition size: 5]) > +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], > functions=[(sum(visits#3L),mode=Partial,isDistinct=false)], > output=[app_key#7,app_channel#9,device_model#20,sum#41L]) > +- Project [app_key#7,app_channel#9,device_model#20,visits#3L] > +- BroadcastHashJoin [lps_did#8,app_key#7], > [lps_did#13,app_key#12], Inner, BuildRight, None >:- Filter (isnotnull(app_key#7) && isnotnull(lps_did#8)) >: +- HiveTableScan [app_key#7,app_channel#9,lps_did#8], > MetastoreRelation default, > rps__h_report_more_dimension_earliest_newuser_list_c, None >+- BroadcastExchange HashedRelationBroadcastMode(List(input[1, > string], input[0, string])) > +- > TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], > functions=[(count(1),mode=Final,isDistinct=false)], > output=[app_key#12,lps_did#13,device_model#20,visits#3L]) > +- Exchange(coordinator id: 733045095) > hashpartitioning(app_key#12, lps_did#13, device_model#20, 600), > Some(coordinator[target post-shuffle partition size: 5]) > +- > TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], > functions=[(count(1),mode=Partial,isDistinct=false)], > output=[app_key#12,lps_did#13,device_model#20,count#39L]) >+- Project [app_key#12,lps_did#13,device_model#20] > +- Filter ((isnotnull(app_key#12) && > isnotnull(lps_did#13)) && ((cast(log_type#11 as int) = 1) || > (cast(log_type#11 as int) = 2))) > +- HiveTableScan > [app_key#12,lps_did#13,device_model#20,log_type#11], MetastoreRelation > default, rps__h_report_more_dimension_small, None, > [isnotnull(p_event_date#10),(p_event_date#10 = 2016-06-18)] > Time taken: 4.775 seconds, Fetched 1 row(s) > 16/06/20 16:55:16 INFO CliDriver: Time taken: 4.775 seconds, Fetched 1 row(s) > Note: +- BroadcastHashJoin [lps_did#8,app_key#7], [lps_did#13,app_key#12], > Inner, BuildRight, None > Result: > 1. Execution failed, spark service is unavailable. > 2. Even though set spark.sql.autoBroadcastJoinThreshold = 1, > BroadcastHashJoin has been used when join two large tables. > Error log is as attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16093) Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1
marymwu created SPARK-16093: --- Summary: Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1 Key: SPARK-16093 URL: https://issues.apache.org/jira/browse/SPARK-16093 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: marymwu Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1 Precondition: set spark.sql.autoBroadcastJoinThreshold = 1; Testcase: "INSERT OVERWRITE TABLE RPS__H_REPORT_MORE_DIMENSION_FIRST_CHANNEL_VISIT_CD_DAY PARTITION (p_event_date='2016-06-18') select a.app_key,a.app_channel,b.device_model,sum(b.visits) visitsNum from (select app_key,app_channel,lps_did from RPS__H_REPORT_MORE_DIMENSION_EARLIEST_NEWUSER_LIST_C ) a join (select app_key,lps_did,device_model, count(1) as visits from RPS__H_REPORT_MORE_DIMENSION_SMALL where p_event_date = '2016-06-18' and ( log_type=1 or log_type=2) group by app_key,lps_did,device_model) b on a.lps_did = b.lps_did and a.app_key=b.app_key group by a.app_key,a.app_channel,b.device_model; " == Physical Plan == InsertIntoHiveTable MetastoreRelation default, rps__h_report_more_dimension_first_channel_visit_cd_day, None, Map(p_event_date -> Some(2016-06-18)), true, false +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], functions=[(sum(visits#3L),mode=Final,isDistinct=false)], output=[app_key#7,app_channel#9,device_model#20,visitsNum#4L]) +- Exchange(coordinator id: 41547585) hashpartitioning(app_key#7, app_channel#9, device_model#20, 600), Some(coordinator[target post-shuffle partition size: 5]) +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], functions=[(sum(visits#3L),mode=Partial,isDistinct=false)], output=[app_key#7,app_channel#9,device_model#20,sum#41L]) +- Project [app_key#7,app_channel#9,device_model#20,visits#3L] +- BroadcastHashJoin [lps_did#8,app_key#7], [lps_did#13,app_key#12], Inner, BuildRight, None :- Filter (isnotnull(app_key#7) && isnotnull(lps_did#8)) : +- HiveTableScan [app_key#7,app_channel#9,lps_did#8], MetastoreRelation default, rps__h_report_more_dimension_earliest_newuser_list_c, None +- BroadcastExchange HashedRelationBroadcastMode(List(input[1, string], input[0, string])) +- TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], functions=[(count(1),mode=Final,isDistinct=false)], output=[app_key#12,lps_did#13,device_model#20,visits#3L]) +- Exchange(coordinator id: 733045095) hashpartitioning(app_key#12, lps_did#13, device_model#20, 600), Some(coordinator[target post-shuffle partition size: 5]) +- TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], functions=[(count(1),mode=Partial,isDistinct=false)], output=[app_key#12,lps_did#13,device_model#20,count#39L]) +- Project [app_key#12,lps_did#13,device_model#20] +- Filter ((isnotnull(app_key#12) && isnotnull(lps_did#13)) && ((cast(log_type#11 as int) = 1) || (cast(log_type#11 as int) = 2))) +- HiveTableScan [app_key#12,lps_did#13,device_model#20,log_type#11], MetastoreRelation default, rps__h_report_more_dimension_small, None, [isnotnull(p_event_date#10),(p_event_date#10 = 2016-06-18)] Time taken: 4.775 seconds, Fetched 1 row(s) 16/06/20 16:55:16 INFO CliDriver: Time taken: 4.775 seconds, Fetched 1 row(s) Note: +- BroadcastHashJoin [lps_did#8,app_key#7], [lps_did#13,app_key#12], Inner, BuildRight, None Result: 1. Execution failed, spark service is unavailable. 2. Even though set spark.sql.autoBroadcastJoinThreshold = 1, BroadcastHashJoin has been used when join two large tables. Error log is as attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16092) Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark2.0 configuration file while Spark1.6 does
marymwu created SPARK-16092: --- Summary: Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark2.0 configuration file while Spark1.6 does Key: SPARK-16092 URL: https://issues.apache.org/jira/browse/SPARK-16092 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: marymwu Spark2.0 take no effect after set hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark2.0 configuration file while Spark1.6 does Precondition: set hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark2.0 configuration file " hive.exec.dynamic.partition.mode nonstrict " Testcase: "insert overwrite table d_test_tpc_2g_txt.marytest1 partition (dt) select t.nid, t.price, t.dt from (select nid, price, dt from d_test_tpc_2g_txt.marytest where dt >= '2016-06-20' and dt <= '2016-06-21') t group by t.nid, t.price, t.dt;" Result: Error: org.apache.spark.SparkException: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict (state=,code=0) Note: Spark1.6 supports the above SQL statement after set hive.exec.dynamic.partition.mode=nonstrict as a global variable in Spark configuration file " hive.exec.dynamic.partition.mode nonstrict " -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16089) Spark2.0 doesn't support the certain static partition SQL statment as "insert overwrite table targetTB PARTITION (partition field=xx) select field1,field2,...,partition
[ https://issues.apache.org/jira/browse/SPARK-16089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marymwu updated SPARK-16089: Attachment: StaticPartitionSQLStatementError.png > Spark2.0 doesn't support the certain static partition SQL statment as "insert > overwrite table targetTB PARTITION (partition field=xx) select > field1,field2,...,partition field from sourceTB where partition field=xx" > while Spark 1.6 supports > --- > > Key: SPARK-16089 > URL: https://issues.apache.org/jira/browse/SPARK-16089 > Project: Spark > Issue Type: Bug > Affects Versions: 2.0.0 >Reporter: marymwu >Priority: Minor > Attachments: StaticPartitionSQLStatementError.png > > > Spark2.0 doesn't support the certain static partition SQL statment as "insert > overwrite table targetTB PARTITION (partition field=xx) select > field1,field2,...,partition field from sourceTB where partition field=xx" > while Spark 1.6 supports. > Testcase: > "insert overwrite table d_test_tpc_2g_txt.marytest1 PARTITION > (dt='2016-06-21') select nid, price, dt from d_test_tpc_2g_txt.marytest where > dt = '2016-06-21';" > Error: org.apache.spark.sql.AnalysisException: unresolved operator > 'InsertIntoTable MetastoreRelation d_test_tpc_2g_txt, marytest1, None, Map(dt > -> Some(2016-06-21)), true, false; (state=,code=0) > see attachment for reference. > Note: > The same SQL statement succeeded in Spark 1.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16089) Spark2.0 doesn't support the certain static partition SQL statment as "insert overwrite table targetTB PARTITION (partition field=xx) select field1,field2,...,partition
marymwu created SPARK-16089: --- Summary: Spark2.0 doesn't support the certain static partition SQL statment as "insert overwrite table targetTB PARTITION (partition field=xx) select field1,field2,...,partition field from sourceTB where partition field=xx" while Spark 1.6 supports Key: SPARK-16089 URL: https://issues.apache.org/jira/browse/SPARK-16089 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: marymwu Priority: Minor Spark2.0 doesn't support the certain static partition SQL statment as "insert overwrite table targetTB PARTITION (partition field=xx) select field1,field2,...,partition field from sourceTB where partition field=xx" while Spark 1.6 supports. Testcase: "insert overwrite table d_test_tpc_2g_txt.marytest1 PARTITION (dt='2016-06-21') select nid, price, dt from d_test_tpc_2g_txt.marytest where dt = '2016-06-21';" Error: org.apache.spark.sql.AnalysisException: unresolved operator 'InsertIntoTable MetastoreRelation d_test_tpc_2g_txt, marytest1, None, Map(dt -> Some(2016-06-21)), true, false; (state=,code=0) see attachment for reference. Note: The same SQL statement succeeded in Spark 1.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15802) SparkSQL connection fail using shell command "bin/beeline -u "jdbc:hive2://*.*.*.*:10000/default""
[ https://issues.apache.org/jira/browse/SPARK-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329051#comment-15329051 ] marymwu commented on SPARK-15802: - We still have a question. How to use "binary" protocol? It seems to us that shell command "bin/beeline -u "jdbc:hive2://*.*.*.*:1/default" means using the "binary" protocol, but SparkSQL connection failed in this situation. > SparkSQL connection fail using shell command "bin/beeline -u > "jdbc:hive2://*.*.*.*:1/default"" > -- > > Key: SPARK-15802 > URL: https://issues.apache.org/jira/browse/SPARK-15802 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: marymwu > > reproduce steps: > 1. execute shell "sbin/start-thriftserver.sh --master yarn"; > 2. execute shell "bin/beeline -u "jdbc:hive2://*.*.*.*:1/default""; > Actually result: > SparkSQL connection failed and the log shows as follows: > 16/06/07 14:49:18 WARN HttpParser: Illegal character 0x1 in state=START for > buffer > HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type: > application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00} > 16/06/07 14:49:18 WARN HttpParser: badMessage: 400 Illegal character 0x1 for > HttpChannelOverHttp@718db102{r=0,c=false,a=IDLE,uri=} > 16/06/07 14:49:19 WARN HttpParser: Illegal character 0x1 in state=START for > buffer > HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type: > application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00} > 16/06/07 14:49:19 WARN HttpParser: badMessage: 400 Illegal character 0x1 for > HttpChannelOverHttp@195db217{r=0,c=false,a=IDLE,uri=} > note: > SparkSQL connection succeeded, if using shell command "bin/beeline -u > "jdbc:hive2://*.*.*.*:1/default;transportMode=http;httpPath=cliservice"" > Two parameters(transportMode&httpPath) have been added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file
[ https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328860#comment-15328860 ] marymwu commented on SPARK-15757: - Any update? > Error occurs when using Spark sql "select" statement on orc file after hive > sql "insert overwrite tb1 select * from sourcTb" has been executed on this > orc file > --- > > Key: SPARK-15757 > URL: https://issues.apache.org/jira/browse/SPARK-15757 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: marymwu > Attachments: Result.png > > > Error occurs when using Spark sql "select" statement on orc file after hive > sql "insert overwrite tb1 select * from sourcTb" has been executed > 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in > stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): > java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist. > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:59) > at > org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251) > at > org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) > at > org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at org.apache.spark.sql.types.StructType.map(StructType.scala:94) > at > org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361) > at > org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123) > at > org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:246) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at
[jira] [Commented] (SPARK-15802) SparkSQL connection fail using shell command "bin/beeline -u "jdbc:hive2://*.*.*.*:10000/default""
[ https://issues.apache.org/jira/browse/SPARK-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319908#comment-15319908 ] marymwu commented on SPARK-15802: - looking forward to your reply, thanks > SparkSQL connection fail using shell command "bin/beeline -u > "jdbc:hive2://*.*.*.*:1/default"" > -- > > Key: SPARK-15802 > URL: https://issues.apache.org/jira/browse/SPARK-15802 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: marymwu > > reproduce steps: > 1. execute shell "sbin/start-thriftserver.sh --master yarn"; > 2. execute shell "bin/beeline -u "jdbc:hive2://*.*.*.*:1/default""; > Actually result: > SparkSQL connection failed and the log shows as follows: > 16/06/07 14:49:18 WARN HttpParser: Illegal character 0x1 in state=START for > buffer > HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type: > application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00} > 16/06/07 14:49:18 WARN HttpParser: badMessage: 400 Illegal character 0x1 for > HttpChannelOverHttp@718db102{r=0,c=false,a=IDLE,uri=} > 16/06/07 14:49:19 WARN HttpParser: Illegal character 0x1 in state=START for > buffer > HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type: > application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00} > 16/06/07 14:49:19 WARN HttpParser: badMessage: 400 Illegal character 0x1 for > HttpChannelOverHttp@195db217{r=0,c=false,a=IDLE,uri=} > note: > SparkSQL connection succeeded, if using shell command "bin/beeline -u > "jdbc:hive2://*.*.*.*:1/default;transportMode=http;httpPath=cliservice"" > Two parameters(transportMode&httpPath) have been added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15802) SparkSQL connection fail using shell command "bin/beeline -u "jdbc:hive2://*.*.*.*:10000/default""
[ https://issues.apache.org/jira/browse/SPARK-15802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319906#comment-15319906 ] marymwu commented on SPARK-15802: - what's the right protocol? how to specify it ? > SparkSQL connection fail using shell command "bin/beeline -u > "jdbc:hive2://*.*.*.*:1/default"" > -- > > Key: SPARK-15802 > URL: https://issues.apache.org/jira/browse/SPARK-15802 > Project: Spark > Issue Type: Bug > Affects Versions: 2.0.0 >Reporter: marymwu > > reproduce steps: > 1. execute shell "sbin/start-thriftserver.sh --master yarn"; > 2. execute shell "bin/beeline -u "jdbc:hive2://*.*.*.*:1/default""; > Actually result: > SparkSQL connection failed and the log shows as follows: > 16/06/07 14:49:18 WARN HttpParser: Illegal character 0x1 in state=START for > buffer > HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type: > application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00} > 16/06/07 14:49:18 WARN HttpParser: badMessage: 400 Illegal character 0x1 for > HttpChannelOverHttp@718db102{r=0,c=false,a=IDLE,uri=} > 16/06/07 14:49:19 WARN HttpParser: Illegal character 0x1 in state=START for > buffer > HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type: > application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00} > 16/06/07 14:49:19 WARN HttpParser: badMessage: 400 Illegal character 0x1 for > HttpChannelOverHttp@195db217{r=0,c=false,a=IDLE,uri=} > note: > SparkSQL connection succeeded, if using shell command "bin/beeline -u > "jdbc:hive2://*.*.*.*:1/default;transportMode=http;httpPath=cliservice"" > Two parameters(transportMode&httpPath) have been added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[GitHub] spark pull request #13550: SPARK-15755
GitHub user marymwu opened a pull request: https://github.com/apache/spark/pull/13550 SPARK-15755 JIRA Issue: https://issues.apache.org/jira/browse/SPARK-15755 java.lang.NullPointerException when run spark 2.0 setting spark.serializer=org.apache.spark.serializer.KryoSerializer You can merge this pull request into a Git repository by running: $ git pull https://github.com/marymwu/spark branch-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13550.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13550 commit a2f43c2f59b461a37947a5696198a4aa7339579d Author: Dongyang DY2 Tang Date: 2016-06-08T01:37:13Z fix bug: java.lang.NullPointerException when run spark 2.0 setting spark.serializer=org.apache.spark.serializer.KryoSerializer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[jira] [Created] (SPARK-15802) SparkSQL connection fail using shell command "bin/beeline -u "jdbc:hive2://*.*.*.*:10000/default""
marymwu created SPARK-15802: --- Summary: SparkSQL connection fail using shell command "bin/beeline -u "jdbc:hive2://*.*.*.*:1/default"" Key: SPARK-15802 URL: https://issues.apache.org/jira/browse/SPARK-15802 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: marymwu reproduce steps: 1. execute shell "sbin/start-thriftserver.sh --master yarn"; 2. execute shell "bin/beeline -u "jdbc:hive2://*.*.*.*:1/default""; Actually result: SparkSQL connection failed and the log shows as follows: 16/06/07 14:49:18 WARN HttpParser: Illegal character 0x1 in state=START for buffer HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type: application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00} 16/06/07 14:49:18 WARN HttpParser: badMessage: 400 Illegal character 0x1 for HttpChannelOverHttp@718db102{r=0,c=false,a=IDLE,uri=} 16/06/07 14:49:19 WARN HttpParser: Illegal character 0x1 in state=START for buffer HeapByteBuffer@485a5ad9[p=1,l=35,c=16384,r=34]={\x01<<<\x00\x00\x00\x05PLAIN\x05\x00\x00\x00\x14\x00an...ymous\x00anonymous>>>Type: application...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00} 16/06/07 14:49:19 WARN HttpParser: badMessage: 400 Illegal character 0x1 for HttpChannelOverHttp@195db217{r=0,c=false,a=IDLE,uri=} note: SparkSQL connection succeeded, if using shell command "bin/beeline -u "jdbc:hive2://*.*.*.*:1/default;transportMode=http;httpPath=cliservice"" Two parameters(transportMode&httpPath) have been added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15755) java.lang.NullPointerException when run spark 2.0 setting spark.serializer=org.apache.spark.serializer.KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-15755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317782#comment-15317782 ] marymwu commented on SPARK-15755: - Any comments? > java.lang.NullPointerException when run spark 2.0 setting > spark.serializer=org.apache.spark.serializer.KryoSerializer > - > > Key: SPARK-15755 > URL: https://issues.apache.org/jira/browse/SPARK-15755 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: marymwu > > java.lang.NullPointerException when run spark 2.0 setting > spark.serializer=org.apache.spark.serializer.KryoSerializer > 16/05/27 15:15:28 ERROR TaskResultGetter: Exception while getting task result > com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException > Serialization trace: > underlying (org.apache.spark.util.BoundedPriorityQueue) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) > at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25) > at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312) > at > org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:157) > at > org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:148) > at scala.math.Ordering$$anon$4.compare(Ordering.scala:111) > at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649) > at java.util.PriorityQueue.siftUp(PriorityQueue.java:627) > at java.util.PriorityQueue.offer(PriorityQueue.java:329) > at java.util.PriorityQueue.add(PriorityQueue.java:306) > at > com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:78) > at > com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:31) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:711) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) > ... 15 more > 16/05/27 15:15:28 ERROR TaskResultGetter: Exception while getting task result > com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException > Serialization trace: > underlying (org.apache.spark.util.BoundedPriorityQueue) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) > at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25) > at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) > at > org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312) > at > org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66) > at > org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) > at > org.apache.spark.scheduler.TaskRes
[jira] [Commented] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file
[ https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317778#comment-15317778 ] marymwu commented on SPARK-15757: - The error occurs steps are as follows: hope it helps 1. use hive command to create a table,example, "create table inventory ( inv_date_sk int, inv_item_sk int, inv_warehouse_sk int, inv_quantity_on_hand int ) row format delimited fields terminated by '\|' stored as orc;" 2. use hive command,execute "insert overwrite inventory select * from sourcTb;"--> important step 3. use spark command, execute "select * from inventory;"-->error occurs as in description. === while we tried the following steps,things look fine: 1. use hive command to create a table,example, "create table inventory ( inv_date_sk int, inv_item_sk int, inv_warehouse_sk int, inv_quantity_on_hand int ) row format delimited fields terminated by '\|' stored as orc;" 2. use spark command,execute "insert overwrite inventory select * from sourcTb;"--> important step 3. use spark command, execute "select * from inventory;"-->error occurs as in description.--succeeded > Error occurs when using Spark sql "select" statement on orc file after hive > sql "insert overwrite tb1 select * from sourcTb" has been executed on this > orc file > --- > > Key: SPARK-15757 > URL: https://issues.apache.org/jira/browse/SPARK-15757 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: marymwu > Attachments: Result.png > > > Error occurs when using Spark sql "select" statement on orc file after hive > sql "insert overwrite tb1 select * from sourcTb" has been executed > 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in > stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): > java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist. > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:59) > at > org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251) > at > org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) > at > org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at org.apache.spark.sql.types.StructType.map(StructType.scala:94) > at > org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361) > at > org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123) > at > org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) >
[jira] [Commented] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file
[ https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316170#comment-15316170 ] marymwu commented on SPARK-15757: - Actually, Field "inv_date_sk" does exist! We have executed "desc inventory", the result is as attached. > Error occurs when using Spark sql "select" statement on orc file after hive > sql "insert overwrite tb1 select * from sourcTb" has been executed on this > orc file > --- > > Key: SPARK-15757 > URL: https://issues.apache.org/jira/browse/SPARK-15757 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: marymwu > Attachments: Result.png > > > Error occurs when using Spark sql "select" statement on orc file after hive > sql "insert overwrite tb1 select * from sourcTb" has been executed > 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in > stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): > java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist. > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:59) > at > org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251) > at > org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) > at > org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at org.apache.spark.sql.types.StructType.map(StructType.scala:94) > at > org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361) > at > org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123) > at > org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:246) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
[jira] [Issue Comment Deleted] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc
[ https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marymwu updated SPARK-15757: Comment: was deleted (was: Actually, Field "inv_date_sk" does exist! We have executed "desc inventory", the result is as attached. ) > Error occurs when using Spark sql "select" statement on orc file after hive > sql "insert overwrite tb1 select * from sourcTb" has been executed on this > orc file > --- > > Key: SPARK-15757 > URL: https://issues.apache.org/jira/browse/SPARK-15757 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: marymwu > Attachments: Result.png > > > Error occurs when using Spark sql "select" statement on orc file after hive > sql "insert overwrite tb1 select * from sourcTb" has been executed > 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in > stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): > java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist. > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:59) > at > org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251) > at > org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) > at > org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at org.apache.spark.sql.types.StructType.map(StructType.scala:94) > at > org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361) > at > org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123) > at > org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:246) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > ja
[jira] [Updated] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file
[ https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marymwu updated SPARK-15757: Attachment: Result.png > Error occurs when using Spark sql "select" statement on orc file after hive > sql "insert overwrite tb1 select * from sourcTb" has been executed on this > orc file > --- > > Key: SPARK-15757 > URL: https://issues.apache.org/jira/browse/SPARK-15757 > Project: Spark > Issue Type: Bug > Affects Versions: 2.0.0 >Reporter: marymwu > Attachments: Result.png > > > Error occurs when using Spark sql "select" statement on orc file after hive > sql "insert overwrite tb1 select * from sourcTb" has been executed > 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in > stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): > java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist. > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:59) > at > org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251) > at > org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) > at > org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at org.apache.spark.sql.types.StructType.map(StructType.scala:94) > at > org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361) > at > org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123) > at > org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:246) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExe
[jira] [Commented] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file
[ https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316168#comment-15316168 ] marymwu commented on SPARK-15757: - Actually, Field "inv_date_sk" does exist! We have executed "desc inventory", the result is as attached. > Error occurs when using Spark sql "select" statement on orc file after hive > sql "insert overwrite tb1 select * from sourcTb" has been executed on this > orc file > --- > > Key: SPARK-15757 > URL: https://issues.apache.org/jira/browse/SPARK-15757 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: marymwu > > Error occurs when using Spark sql "select" statement on orc file after hive > sql "insert overwrite tb1 select * from sourcTb" has been executed > 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in > stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): > java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist. > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:59) > at > org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251) > at > org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) > at > org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at org.apache.spark.sql.types.StructType.map(StructType.scala:94) > at > org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361) > at > org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123) > at > org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:246) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java
[jira] [Updated] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file
[ https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] marymwu updated SPARK-15757: Summary: Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file (was: Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed) > Error occurs when using Spark sql "select" statement on orc file after hive > sql "insert overwrite tb1 select * from sourcTb" has been executed on this > orc file > --- > > Key: SPARK-15757 > URL: https://issues.apache.org/jira/browse/SPARK-15757 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: marymwu > > Error occurs when using Spark sql "select" statement on orc file after hive > sql "insert overwrite tb1 select * from sourcTb" has been executed > 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in > stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): > java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist. > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) > at > org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:59) > at > org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251) > at > org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) > at > org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at org.apache.spark.sql.types.StructType.map(StructType.scala:94) > at > org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361) > at > org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123) > at > org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:246) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) >
[jira] [Created] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed
marymwu created SPARK-15757: --- Summary: Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed Key: SPARK-15757 URL: https://issues.apache.org/jira/browse/SPARK-15757 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: marymwu Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed 0: jdbc:hive2://172.19.200.158:40099/default> select * from inventory; Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7.0 failed 8 times, most recent failure: Lost task 0.7 in stage 7.0 (TID 2532, smokeslave5.avatar.lenovomm.com): java.lang.IllegalArgumentException: Field "inv_date_sk" does not exist. at org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) at org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:252) at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) at scala.collection.AbstractMap.getOrElse(Map.scala:59) at org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:251) at org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) at org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$10.apply(OrcRelation.scala:361) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at org.apache.spark.sql.types.StructType.foreach(StructType.scala:94) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at org.apache.spark.sql.types.StructType.map(StructType.scala:94) at org.apache.spark.sql.hive.orc.OrcRelation$.setRequiredColumns(OrcRelation.scala:361) at org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:123) at org.apache.spark.sql.hive.orc.DefaultSource$$anonfun$buildReader$2.apply(OrcRelation.scala:112) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:278) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:262) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:114) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:357) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:246) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:240) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15756) SQL “stored as orcfile” cannot be supported while hive supports both keywords "orc" and "orcfile"
marymwu created SPARK-15756: --- Summary: SQL “stored as orcfile” cannot be supported while hive supports both keywords "orc" and "orcfile" Key: SPARK-15756 URL: https://issues.apache.org/jira/browse/SPARK-15756 Project: Spark Issue Type: Improvement Affects Versions: 2.0.0 Reporter: marymwu Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15755) java.lang.NullPointerException when run spark 2.0 setting spark.serializer=org.apache.spark.serializer.KryoSerializer
marymwu created SPARK-15755: --- Summary: java.lang.NullPointerException when run spark 2.0 setting spark.serializer=org.apache.spark.serializer.KryoSerializer Key: SPARK-15755 URL: https://issues.apache.org/jira/browse/SPARK-15755 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: marymwu java.lang.NullPointerException when run spark 2.0 setting spark.serializer=org.apache.spark.serializer.KryoSerializer 16/05/27 15:15:28 ERROR TaskResultGetter: Exception while getting task result com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException Serialization trace: underlying (org.apache.spark.util.BoundedPriorityQueue) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25) at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:157) at org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala:148) at scala.math.Ordering$$anon$4.compare(Ordering.scala:111) at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649) at java.util.PriorityQueue.siftUp(PriorityQueue.java:627) at java.util.PriorityQueue.offer(PriorityQueue.java:329) at java.util.PriorityQueue.add(PriorityQueue.java:306) at com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:78) at com.twitter.chill.java.PriorityQueueSerializer.read(PriorityQueueSerializer.java:31) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:711) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) ... 15 more 16/05/27 15:15:28 ERROR TaskResultGetter: Exception while getting task result com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException Serialization trace: underlying (org.apache.spark.util.BoundedPriorityQueue) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25) at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:793) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:312) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:87) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:66) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:57) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:56) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering.compare(GenerateOrdering.scala