[jira] [Resolved] (SPARK-21579) dropTempView has a critical BUG
[ https://issues.apache.org/jira/browse/SPARK-21579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21579. --- Resolution: Not A Problem > dropTempView has a critical BUG > --- > > Key: SPARK-21579 > URL: https://issues.apache.org/jira/browse/SPARK-21579 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1, 2.2.0 >Reporter: ant_nebula >Priority: Critical > Attachments: screenshot-1.png > > > when I dropTempView dwd_table1 only, sub table dwd_table2 also disappear from > http://127.0.0.1:4040/storage/. > It affect version 2.1.1 and 2.2.0, 2.1.0 is ok for this problem. > {code:java} > val spark = > SparkSession.builder.master("local").appName("sparkTest").getOrCreate() > val rows = Seq(Row("p1", 30), Row("p2", 20), Row("p3", 25), Row("p4", 10), > Row("p5", 40), Row("p6", 15)) > val schema = new StructType().add(StructField("name", > StringType)).add(StructField("age", IntegerType)) > val rowRDD = spark.sparkContext.parallelize(rows, 3) > val df = spark.createDataFrame(rowRDD, schema) > df.createOrReplaceTempView("ods_table") > spark.sql("cache table ods_table") > spark.sql("cache table dwd_table1 as select * from ods_table where age>=25") > spark.sql("cache table dwd_table2 as select * from dwd_table1 where > name='p1'") > spark.catalog.dropTempView("dwd_table1") > //spark.catalog.dropTempView("ods_table") > spark.sql("select * from dwd_table2").show() > {code} > It will keep ods_table1 in memory, although it will not been used anymore. It > waste memory, especially when my service diagram much more complex > !screenshot-1.png! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21591) Implement treeAggregate on Dataset API
[ https://issues.apache.org/jira/browse/SPARK-21591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-21591: Description: The Tungsten execution engine substantially improved the efficiency of memory and CPU for Spark application. However, in MLlib we still not migrate the internal computing workload from {{RDD}} to {{DataFrame}}. The main block issue is there is no {{treeAggregate}} on {{DataFrame}}. It's very important for MLlib algorithms, since they do aggregate on {{Vector}} which may has millions of elements. As we all know, {{RDD}} based {{treeAggregate}} reduces the aggregation time by an order of magnitude for lots of MLlib algorithms(https://databricks.com/blog/2014/09/22/spark-1-1-mllib-performance-improvements.html). I open this JIRA to discuss to implement {{treeAggregate}} on {{DataFrame}} API and do the performance benchmark related issues. And I think other scenarios except MLlib will also benefit from this improvement if we get it done. was: The Tungsten execution engine substantially improved the efficiency of memory and CPU for Spark application. However, in MLlib we still not migrate the internal computing workload from {{RDD}} to {{DataFrame}}. The main block issue is there is no {{treeAggregate}} on {{DataFrame}}. It's very important for MLlib algorithms, since they do aggregate on vector who may has millions of elements. As we all know, {{RDD}} based {{treeAggregate}} reduces the aggregation time by an order of magnitude for lots of MLlib algorithms(https://databricks.com/blog/2014/09/22/spark-1-1-mllib-performance-improvements.html). I open this JIRA to discuss to implement {{treeAggregate}} on {{DataFrame}} API and do the performance benchmark related issues. And I think other scenarios except MLlib will also benefit from this improvement if we get it done. > Implement treeAggregate on Dataset API > -- > > Key: SPARK-21591 > URL: https://issues.apache.org/jira/browse/SPARK-21591 > Project: Spark > Issue Type: Brainstorming > Components: SQL >Affects Versions: 2.2.0 >Reporter: Yanbo Liang > > The Tungsten execution engine substantially improved the efficiency of memory > and CPU for Spark application. However, in MLlib we still not migrate the > internal computing workload from {{RDD}} to {{DataFrame}}. > The main block issue is there is no {{treeAggregate}} on {{DataFrame}}. It's > very important for MLlib algorithms, since they do aggregate on {{Vector}} > which may has millions of elements. As we all know, {{RDD}} based > {{treeAggregate}} reduces the aggregation time by an order of magnitude for > lots of MLlib > algorithms(https://databricks.com/blog/2014/09/22/spark-1-1-mllib-performance-improvements.html). > I open this JIRA to discuss to implement {{treeAggregate}} on {{DataFrame}} > API and do the performance benchmark related issues. And I think other > scenarios except MLlib will also benefit from this improvement if we get it > done. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21592) Skip maven-compiler-plugin main and test compilations in Maven build
Grzegorz Slowikowski created SPARK-21592: Summary: Skip maven-compiler-plugin main and test compilations in Maven build Key: SPARK-21592 URL: https://issues.apache.org/jira/browse/SPARK-21592 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 2.2.0 Reporter: Grzegorz Slowikowski Priority: Minor `scala-maven-plugin` in incremental mode compiles Scala and Java classes. There is no need to execute `maven-compiler-plugin` goals to compile (in fact recompile) Java. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21522) Flaky test: LauncherServerSuite.testStreamFiltering
[ https://issues.apache.org/jira/browse/SPARK-21522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-21522. Resolution: Fixed Assignee: Marcelo Vanzin Fix Version/s: 2.3.0 2.2.1 2.1.2 2.0.3 > Flaky test: LauncherServerSuite.testStreamFiltering > --- > > Key: SPARK-21522 > URL: https://issues.apache.org/jira/browse/SPARK-21522 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 2.1.1, 2.2.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Minor > Fix For: 2.0.3, 2.1.2, 2.2.1, 2.3.0 > > > We ran into this in our internal Jenkins servers. Partial stack trace: > {noformat} > java.net.SocketException: Broken pipe > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) > at java.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) > at > java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785) > at > java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1285) > at > java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1230) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1426) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at > java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1576) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:350) > at > org.apache.spark.launcher.LauncherConnection.send(LauncherConnection.java:82) > at > org.apache.spark.launcher.LauncherServerSuite.testStreamFiltering(LauncherServerSuite.java:174) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109311#comment-16109311 ] Sean Owen commented on SPARK-650: - I still don't see an argument against my primary suggestion: the singleton. The last comment on it just said, oh, how do you do it? it's quite possible. Nothing to do with the serializer. > Add a "setup hook" API for running initialization code on each executor > --- > > Key: SPARK-650 > URL: https://issues.apache.org/jira/browse/SPARK-650 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Matei Zaharia >Priority: Minor > > Would be useful to configure things like reporting libraries -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20079) Re registration of AM hangs spark cluster in yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-20079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-20079. Resolution: Fixed Assignee: Marcelo Vanzin Fix Version/s: 2.3.0 > Re registration of AM hangs spark cluster in yarn-client mode > - > > Key: SPARK-20079 > URL: https://issues.apache.org/jira/browse/SPARK-20079 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 2.1.0 >Reporter: Guoqiang Li >Assignee: Marcelo Vanzin > Fix For: 2.3.0 > > > The ExecutorAllocationManager.reset method is called when re-registering AM, > which sets the ExecutorAllocationManager.initializing field true. When this > field is true, the Driver does not start a new executor from the AM request. > The following two cases will cause the field to False > 1. A executor idle for some time. > 2. There are new stages to be submitted > After the a stage was submitted, the AM was killed and restart ,the above two > cases will not appear. > 1. When AM is killed, the yarn will kill all running containers. All execuotr > will be lost and no executor will be idle. > 2. No surviving executor, resulting in the current stage will never be > completed, DAG will not submit a new stage. > Reproduction steps: > 1. Start cluster > {noformat} > echo -e "sc.parallelize(1 to 2000).foreach(_ => Thread.sleep(1000))" | > ./bin/spark-shell --master yarn-client --executor-cores 1 --conf > spark.shuffle.service.enabled=true --conf > spark.dynamicAllocation.enabled=true --conf > spark.dynamicAllocation.maxExecutors=2 > {noformat} > 2. Kill the AM process when a stage is scheduled. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109179#comment-16109179 ] Louis Bergelson commented on SPARK-650: --- I can't understand how people are dismissing this as not an issue. There are many cases where you need to initialize something on an executor, and many of them need input from the driver. All of the given workarounds are terrible hacks and at best force bad design, and at worst introduce confusing and non-deterministic bugs. Any time that the recommended solution to a common problem that many people are having is to abuse the Serializer in order to trick it into executing non-serialization code it seems obvious that there's a missing capability in the system. The fact that executors can come on and offline at any time during the run makes it especially essential that we have a robust way of initializing them. I just really don't understand the opposition to adding an initialization hook, it would solve so many problems in a clean way and doesn't seem like it would be particularly problematic on its own. > Add a "setup hook" API for running initialization code on each executor > --- > > Key: SPARK-650 > URL: https://issues.apache.org/jira/browse/SPARK-650 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Matei Zaharia >Priority: Minor > > Would be useful to configure things like reporting libraries -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20657) Speed up Stage page
[ https://issues.apache.org/jira/browse/SPARK-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-20657: --- Description: The Stage page in the UI is very slow when a large number of tasks exist (tens of thousands). The new work being done in SPARK-18085 makes that worse, since it adds potential disk access to the mix. A lot of the slowness is because the code loads all the tasks in memory then sorts a really large list, and does a lot of calculations on all the data; both can be avoided with the new app state store by having smarter indices (so data is read from the store sorted in the desired order) and by keeping statistics about metrics pre-calculated (instead of re-doing that on every page access). Then only the tasks on the current page (100 items by default) need to actually be loaded. This also saves a lot on memory usage, not just CPU time. was: The Stage page in the UI is very slow when a large number of tasks exist (tens of thousands). The new work being done in SPARK-18085 makes that worse, since it adds potential disk access to the mix. A lot of the slowness is because the code loads all the tasks in memory then sorts a really large list, and does a lot of calculations on all the data; both can be avoided with the new app state store by having smarter indices (so data is read from the store sorted in the desired order) and by keeping statistics about metrics pre-calculated (instead of re-doing that on every page access). Then only the tasks on the current page (100 items by default) need to actually be loaded. This also saves a lot on memory usage, no just CPU time. > Speed up Stage page > --- > > Key: SPARK-20657 > URL: https://issues.apache.org/jira/browse/SPARK-20657 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Marcelo Vanzin > > The Stage page in the UI is very slow when a large number of tasks exist > (tens of thousands). The new work being done in SPARK-18085 makes that worse, > since it adds potential disk access to the mix. > A lot of the slowness is because the code loads all the tasks in memory then > sorts a really large list, and does a lot of calculations on all the data; > both can be avoided with the new app state store by having smarter indices > (so data is read from the store sorted in the desired order) and by keeping > statistics about metrics pre-calculated (instead of re-doing that on every > page access). > Then only the tasks on the current page (100 items by default) need to > actually be loaded. This also saves a lot on memory usage, not just CPU time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21590) Structured Streaming window start time should support negative values to adjust time zone
Kevin Zhang created SPARK-21590: --- Summary: Structured Streaming window start time should support negative values to adjust time zone Key: SPARK-21590 URL: https://issues.apache.org/jira/browse/SPARK-21590 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.2.0, 2.1.0, 2.0.1, 2.0.0 Environment: spark 2.2.0 Reporter: Kevin Zhang I want to calculate (unique) daily access count using structured streaming (2.2.0). Now strut streaming' s window with 1 day duration starts at 00:00:00 UTC and ends at 23:59:59 UTC each day, but my local timezone is CST (UTC + 8 hours) and I want date boundaries to be 00:00:00 CST (that is 00:00:00 UTC - 8). In Flink I can set the window offset to -8 hours to make it, but here in struct streaming if I set the start time (same as the offset in Flink) to -8 or any other negative values, I will get the following error: {code:shell} Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'timewindow(timestamp, 864, 864, -288)' due to data type mismatch: The start time (-288) must be greater than or equal to 0.;; {code} because the time window checks the input parameters to guarantee each value is greater than or equal to 0. So I'm thinking about whether we can remove the limit that the start time cannot be negative? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21593) Fix broken configuration page
[ https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21593. --- Resolution: Fixed Fix Version/s: 2.3.0 2.2.1 Issue resolved by pull request 18793 [https://github.com/apache/spark/pull/18793] > Fix broken configuration page > - > > Key: SPARK-21593 > URL: https://issues.apache.org/jira/browse/SPARK-21593 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.0 > Environment: Chrome/Firefox >Reporter: Artur Sukhenko >Priority: Minor > Fix For: 2.2.1, 2.3.0 > > Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg > > > Latest configuration page for Spark 2.2.0 has broken menu list and named > anchors. > Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] > with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] > Or try this link [Configuration # Dynamic > Allocation|https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation] > with should open Dynamic Allocation part of the page, but doesn't. > !dyn_latest.jpg! > > !dyn_211.jpg! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21573) Tests failing with run-tests.py SyntaxError occasionally in Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109440#comment-16109440 ] Sean Owen commented on SPARK-21573: --- FWIW I noticed it fails on 4 and 7 but not on 6. Go ahead with that PR when ready :) it's failing about half the builds right now unfortunately. > Tests failing with run-tests.py SyntaxError occasionally in Jenkins > --- > > Key: SPARK-21573 > URL: https://issues.apache.org/jira/browse/SPARK-21573 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon >Priority: Minor > > It looks default {{python}} in the path at few places such as > {{./dev/run-tests}} use Python 2.6 in Jenkins and it fails to execute > {{run-tests.py}}: > {code} > python2.6 run-tests.py > File "run-tests.py", line 124 > {m: set(m.dependencies).intersection(modules_to_test) for m in > modules_to_test}, sort=True) > ^ > SyntaxError: invalid syntax > {code} > It looks there are quite some places to fix to support Python 2.6 in > {{run-tests.py}} and related Python scripts. > We might just try to set Python 2.7 in few other scripts running this if > available. > Please also see > http://apache-spark-developers-list.1001551.n3.nabble.com/Tests-failing-with-run-tests-py-SyntaxError-td22030.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21592) Skip maven-compiler-plugin main and test compilations in Maven build
[ https://issues.apache.org/jira/browse/SPARK-21592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21592. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18750 [https://github.com/apache/spark/pull/18750] > Skip maven-compiler-plugin main and test compilations in Maven build > > > Key: SPARK-21592 > URL: https://issues.apache.org/jira/browse/SPARK-21592 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.2.0 >Reporter: Grzegorz Slowikowski >Priority: Minor > Labels: maven > Fix For: 2.3.0 > > > `scala-maven-plugin` in incremental mode compiles Scala and Java classes. > There is no need to execute `maven-compiler-plugin` goals to compile (in fact > recompile) Java. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21593) Fix broken configuration page
[ https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-21593: - Assignee: Sean Owen > Fix broken configuration page > - > > Key: SPARK-21593 > URL: https://issues.apache.org/jira/browse/SPARK-21593 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.0 > Environment: Chrome/Firefox >Reporter: Artur Sukhenko >Assignee: Sean Owen >Priority: Minor > Fix For: 2.2.1, 2.3.0 > > Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg > > > Latest configuration page for Spark 2.2.0 has broken menu list and named > anchors. > Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] > with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] > Or try this link [Configuration # Dynamic > Allocation|https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation] > with should open Dynamic Allocation part of the page, but doesn't. > !dyn_latest.jpg! > > !dyn_211.jpg! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21592) Skip maven-compiler-plugin main and test compilations in Maven build
[ https://issues.apache.org/jira/browse/SPARK-21592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-21592: - Assignee: Grzegorz Slowikowski > Skip maven-compiler-plugin main and test compilations in Maven build > > > Key: SPARK-21592 > URL: https://issues.apache.org/jira/browse/SPARK-21592 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.2.0 >Reporter: Grzegorz Slowikowski >Assignee: Grzegorz Slowikowski >Priority: Minor > Labels: maven > Fix For: 2.3.0 > > > `scala-maven-plugin` in incremental mode compiles Scala and Java classes. > There is no need to execute `maven-compiler-plugin` goals to compile (in fact > recompile) Java. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21596) Audit the places calling HDFSMetadataLog.get
Shixiong Zhu created SPARK-21596: Summary: Audit the places calling HDFSMetadataLog.get Key: SPARK-21596 URL: https://issues.apache.org/jira/browse/SPARK-21596 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.1.1 Reporter: Shixiong Zhu When I was investigating a flaky test, I realized that many places don't check the return value of `HDFSMetadataLog.get(batchId: Long): Option[T]`. When a batch is supposed to be there, the caller just ignores None rather than throwing an error. If some bug causes a query doesn't generate a batch metadata file, this behavior will hide it and allow the query continuing to run. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21573) Tests failing with run-tests.py SyntaxError occasionally in Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109465#comment-16109465 ] shane knapp commented on SPARK-21573: - i'm chatting w/@joshrosen @ 130pm to discuss how to proceed. besides this PR, i have a couple of other options i'd like to bounce off of him first. > Tests failing with run-tests.py SyntaxError occasionally in Jenkins > --- > > Key: SPARK-21573 > URL: https://issues.apache.org/jira/browse/SPARK-21573 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon >Priority: Minor > > It looks default {{python}} in the path at few places such as > {{./dev/run-tests}} use Python 2.6 in Jenkins and it fails to execute > {{run-tests.py}}: > {code} > python2.6 run-tests.py > File "run-tests.py", line 124 > {m: set(m.dependencies).intersection(modules_to_test) for m in > modules_to_test}, sort=True) > ^ > SyntaxError: invalid syntax > {code} > It looks there are quite some places to fix to support Python 2.6 in > {{run-tests.py}} and related Python scripts. > We might just try to set Python 2.7 in few other scripts running this if > available. > Please also see > http://apache-spark-developers-list.1001551.n3.nabble.com/Tests-failing-with-run-tests-py-SyntaxError-td22030.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21589) Add documents about unsupported functions in Hive UDF/UDTF/UDAF
[ https://issues.apache.org/jira/browse/SPARK-21589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21589. - Resolution: Fixed Assignee: Takeshi Yamamuro Fix Version/s: 2.3.0 > Add documents about unsupported functions in Hive UDF/UDTF/UDAF > --- > > Key: SPARK-21589 > URL: https://issues.apache.org/jira/browse/SPARK-21589 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 2.2.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Minor > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21579) dropTempView has a critical BUG
[ https://issues.apache.org/jira/browse/SPARK-21579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21579: Issue Type: Improvement (was: Bug) > dropTempView has a critical BUG > --- > > Key: SPARK-21579 > URL: https://issues.apache.org/jira/browse/SPARK-21579 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1, 2.2.0 >Reporter: ant_nebula >Priority: Critical > Attachments: screenshot-1.png > > > when I dropTempView dwd_table1 only, sub table dwd_table2 also disappear from > http://127.0.0.1:4040/storage/. > It affect version 2.1.1 and 2.2.0, 2.1.0 is ok for this problem. > {code:java} > val spark = > SparkSession.builder.master("local").appName("sparkTest").getOrCreate() > val rows = Seq(Row("p1", 30), Row("p2", 20), Row("p3", 25), Row("p4", 10), > Row("p5", 40), Row("p6", 15)) > val schema = new StructType().add(StructField("name", > StringType)).add(StructField("age", IntegerType)) > val rowRDD = spark.sparkContext.parallelize(rows, 3) > val df = spark.createDataFrame(rowRDD, schema) > df.createOrReplaceTempView("ods_table") > spark.sql("cache table ods_table") > spark.sql("cache table dwd_table1 as select * from ods_table where age>=25") > spark.sql("cache table dwd_table2 as select * from dwd_table1 where > name='p1'") > spark.catalog.dropTempView("dwd_table1") > //spark.catalog.dropTempView("ods_table") > spark.sql("select * from dwd_table2").show() > {code} > It will keep ods_table1 in memory, although it will not been used anymore. It > waste memory, especially when my service diagram much more complex > !screenshot-1.png! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21579) dropTempView has a critical BUG
[ https://issues.apache.org/jira/browse/SPARK-21579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108440#comment-16108440 ] Xiao Li commented on SPARK-21579: - Like what I said in the PR. Correctness is more important for us. The cached plan will be reused when we build any other plans. Thus, users might see the out-of-dated results. To achieve what you want, it requires introducing new concept, like materialized views, which will not be used by our plan matching in query execution. > dropTempView has a critical BUG > --- > > Key: SPARK-21579 > URL: https://issues.apache.org/jira/browse/SPARK-21579 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1, 2.2.0 >Reporter: ant_nebula >Priority: Critical > Attachments: screenshot-1.png > > > when I dropTempView dwd_table1 only, sub table dwd_table2 also disappear from > http://127.0.0.1:4040/storage/. > It affect version 2.1.1 and 2.2.0, 2.1.0 is ok for this problem. > {code:java} > val spark = > SparkSession.builder.master("local").appName("sparkTest").getOrCreate() > val rows = Seq(Row("p1", 30), Row("p2", 20), Row("p3", 25), Row("p4", 10), > Row("p5", 40), Row("p6", 15)) > val schema = new StructType().add(StructField("name", > StringType)).add(StructField("age", IntegerType)) > val rowRDD = spark.sparkContext.parallelize(rows, 3) > val df = spark.createDataFrame(rowRDD, schema) > df.createOrReplaceTempView("ods_table") > spark.sql("cache table ods_table") > spark.sql("cache table dwd_table1 as select * from ods_table where age>=25") > spark.sql("cache table dwd_table2 as select * from dwd_table1 where > name='p1'") > spark.catalog.dropTempView("dwd_table1") > //spark.catalog.dropTempView("ods_table") > spark.sql("select * from dwd_table2").show() > {code} > It will keep ods_table1 in memory, although it will not been used anymore. It > waste memory, especially when my service diagram much more complex > !screenshot-1.png! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org