[jira] [Resolved] (SPARK-21579) dropTempView has a critical BUG

2017-08-01 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-21579.
---
Resolution: Not A Problem

> dropTempView has a critical BUG
> ---
>
> Key: SPARK-21579
> URL: https://issues.apache.org/jira/browse/SPARK-21579
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: ant_nebula
>Priority: Critical
> Attachments: screenshot-1.png
>
>
> when I dropTempView dwd_table1 only, sub table dwd_table2 also disappear from 
> http://127.0.0.1:4040/storage/. 
> It affect version 2.1.1 and 2.2.0, 2.1.0 is ok for this problem.
> {code:java}
> val spark = 
> SparkSession.builder.master("local").appName("sparkTest").getOrCreate()
> val rows = Seq(Row("p1", 30), Row("p2", 20), Row("p3", 25), Row("p4", 10), 
> Row("p5", 40), Row("p6", 15))
> val schema = new StructType().add(StructField("name", 
> StringType)).add(StructField("age", IntegerType))
> val rowRDD = spark.sparkContext.parallelize(rows, 3)
> val df = spark.createDataFrame(rowRDD, schema)
> df.createOrReplaceTempView("ods_table")
> spark.sql("cache table ods_table")
> spark.sql("cache table dwd_table1 as select * from ods_table where age>=25")
> spark.sql("cache table dwd_table2 as select * from dwd_table1 where 
> name='p1'")
> spark.catalog.dropTempView("dwd_table1")
> //spark.catalog.dropTempView("ods_table")
> spark.sql("select * from dwd_table2").show()
> {code}
> It will keep ods_table1 in memory, although it will not been used anymore. It 
> waste memory, especially when my service diagram much more complex
> !screenshot-1.png!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21591) Implement treeAggregate on Dataset API

2017-08-01 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-21591:

Description: 
The Tungsten execution engine substantially improved the efficiency of memory 
and CPU for Spark application. However, in MLlib we still not migrate the 
internal computing workload from {{RDD}} to {{DataFrame}}.
The main block issue is there is no {{treeAggregate}} on {{DataFrame}}. It's 
very important for MLlib algorithms, since they do aggregate on {{Vector}} 
which may has millions of elements. As we all know, {{RDD}} based 
{{treeAggregate}} reduces the aggregation time by an order of magnitude for  
lots of MLlib 
algorithms(https://databricks.com/blog/2014/09/22/spark-1-1-mllib-performance-improvements.html).
I open this JIRA to discuss to implement {{treeAggregate}} on {{DataFrame}} API 
and do the performance benchmark related issues. And I think other scenarios 
except MLlib will also benefit from this improvement if we get it done.

  was:
The Tungsten execution engine substantially improved the efficiency of memory 
and CPU for Spark application. However, in MLlib we still not migrate the 
internal computing workload from {{RDD}} to {{DataFrame}}.
The main block issue is there is no {{treeAggregate}} on {{DataFrame}}. It's 
very important for MLlib algorithms, since they do aggregate on vector who may 
has millions of elements. As we all know, {{RDD}} based {{treeAggregate}} 
reduces the aggregation time by an order of magnitude for  lots of MLlib 
algorithms(https://databricks.com/blog/2014/09/22/spark-1-1-mllib-performance-improvements.html).
I open this JIRA to discuss to implement {{treeAggregate}} on {{DataFrame}} API 
and do the performance benchmark related issues. And I think other scenarios 
except MLlib will also benefit from this improvement if we get it done.


> Implement treeAggregate on Dataset API
> --
>
> Key: SPARK-21591
> URL: https://issues.apache.org/jira/browse/SPARK-21591
> Project: Spark
>  Issue Type: Brainstorming
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Yanbo Liang
>
> The Tungsten execution engine substantially improved the efficiency of memory 
> and CPU for Spark application. However, in MLlib we still not migrate the 
> internal computing workload from {{RDD}} to {{DataFrame}}.
> The main block issue is there is no {{treeAggregate}} on {{DataFrame}}. It's 
> very important for MLlib algorithms, since they do aggregate on {{Vector}} 
> which may has millions of elements. As we all know, {{RDD}} based 
> {{treeAggregate}} reduces the aggregation time by an order of magnitude for  
> lots of MLlib 
> algorithms(https://databricks.com/blog/2014/09/22/spark-1-1-mllib-performance-improvements.html).
> I open this JIRA to discuss to implement {{treeAggregate}} on {{DataFrame}} 
> API and do the performance benchmark related issues. And I think other 
> scenarios except MLlib will also benefit from this improvement if we get it 
> done.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21592) Skip maven-compiler-plugin main and test compilations in Maven build

2017-08-01 Thread Grzegorz Slowikowski (JIRA)
Grzegorz Slowikowski created SPARK-21592:


 Summary: Skip maven-compiler-plugin main and test compilations in 
Maven build
 Key: SPARK-21592
 URL: https://issues.apache.org/jira/browse/SPARK-21592
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 2.2.0
Reporter: Grzegorz Slowikowski
Priority: Minor


`scala-maven-plugin` in incremental mode compiles Scala and Java classes. There 
is no need to execute `maven-compiler-plugin` goals to compile (in fact 
recompile) Java.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21522) Flaky test: LauncherServerSuite.testStreamFiltering

2017-08-01 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-21522.

   Resolution: Fixed
 Assignee: Marcelo Vanzin
Fix Version/s: 2.3.0
   2.2.1
   2.1.2
   2.0.3

> Flaky test: LauncherServerSuite.testStreamFiltering
> ---
>
> Key: SPARK-21522
> URL: https://issues.apache.org/jira/browse/SPARK-21522
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 2.0.3, 2.1.2, 2.2.1, 2.3.0
>
>
> We ran into this in our internal Jenkins servers. Partial stack trace:
> {noformat}
> java.net.SocketException: Broken pipe
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
>   at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
>   at 
> java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
>   at 
> java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785)
>   at 
> java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1285)
>   at 
> java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1230)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1426)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>   at 
> java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1576)
>   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:350)
>   at 
> org.apache.spark.launcher.LauncherConnection.send(LauncherConnection.java:82)
>   at 
> org.apache.spark.launcher.LauncherServerSuite.testStreamFiltering(LauncherServerSuite.java:174)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2017-08-01 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109311#comment-16109311
 ] 

Sean Owen commented on SPARK-650:
-

I still don't see an argument against my primary suggestion: the singleton. The 
last comment on it just said, oh, how do you do it? it's quite possible. 
Nothing to do with the serializer.

> Add a "setup hook" API for running initialization code on each executor
> ---
>
> Key: SPARK-650
> URL: https://issues.apache.org/jira/browse/SPARK-650
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Matei Zaharia
>Priority: Minor
>
> Would be useful to configure things like reporting libraries



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20079) Re registration of AM hangs spark cluster in yarn-client mode

2017-08-01 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-20079.

   Resolution: Fixed
 Assignee: Marcelo Vanzin
Fix Version/s: 2.3.0

> Re registration of AM hangs spark cluster in yarn-client mode
> -
>
> Key: SPARK-20079
> URL: https://issues.apache.org/jira/browse/SPARK-20079
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.0
>Reporter: Guoqiang Li
>Assignee: Marcelo Vanzin
> Fix For: 2.3.0
>
>
> The ExecutorAllocationManager.reset method is called when re-registering AM, 
> which sets the ExecutorAllocationManager.initializing field true. When this 
> field is true, the Driver does not start a new executor from the AM request. 
> The following two cases will cause the field to False
> 1. A executor idle for some time.
> 2. There are new stages to be submitted
> After the a stage was submitted, the AM was killed and restart ,the above two 
> cases will not appear.
> 1. When AM is killed, the yarn will kill all running containers. All execuotr 
> will be lost and no executor will be idle.
> 2. No surviving executor, resulting in the current stage will never be 
> completed, DAG will not submit a new stage.
> Reproduction steps:
> 1. Start cluster
> {noformat}
> echo -e "sc.parallelize(1 to 2000).foreach(_ => Thread.sleep(1000))" | 
> ./bin/spark-shell  --master yarn-client --executor-cores 1 --conf 
> spark.shuffle.service.enabled=true --conf 
> spark.dynamicAllocation.enabled=true --conf 
> spark.dynamicAllocation.maxExecutors=2
> {noformat}
> 2.  Kill the AM process when a stage is scheduled. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2017-08-01 Thread Louis Bergelson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109179#comment-16109179
 ] 

Louis Bergelson commented on SPARK-650:
---

I can't understand how people are dismissing this as not an issue.  There are 
many cases where you need to initialize something on an executor, and many of 
them need input from the driver.  All of the given workarounds are terrible 
hacks and at best force bad design, and at worst introduce confusing and 
non-deterministic bugs.  Any time that the recommended solution to a common 
problem that many people are having is to abuse the Serializer in order to 
trick it into executing non-serialization code it seems obvious that there's a 
missing capability in the system. 

The fact that executors can come on and offline at any time during the run 
makes it especially essential that we have a robust way of initializing them.  
I just really don't understand the opposition to adding an initialization hook, 
it would solve so many problems in a clean way and doesn't seem like it would 
be particularly problematic on its own.

> Add a "setup hook" API for running initialization code on each executor
> ---
>
> Key: SPARK-650
> URL: https://issues.apache.org/jira/browse/SPARK-650
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Matei Zaharia
>Priority: Minor
>
> Would be useful to configure things like reporting libraries



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20657) Speed up Stage page

2017-08-01 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-20657:
---
Description: 
The Stage page in the UI is very slow when a large number of tasks exist (tens 
of thousands). The new work being done in SPARK-18085 makes that worse, since 
it adds potential disk access to the mix.

A lot of the slowness is because the code loads all the tasks in memory then 
sorts a really large list, and does a lot of calculations on all the data; both 
can be avoided with the new app state store by having smarter indices (so data 
is read from the store sorted in the desired order) and by keeping statistics 
about metrics pre-calculated (instead of re-doing that on every page access).

Then only the tasks on the current page (100 items by default) need to actually 
be loaded. This also saves a lot on memory usage, not just CPU time.

  was:
The Stage page in the UI is very slow when a large number of tasks exist (tens 
of thousands). The new work being done in SPARK-18085 makes that worse, since 
it adds potential disk access to the mix.

A lot of the slowness is because the code loads all the tasks in memory then 
sorts a really large list, and does a lot of calculations on all the data; both 
can be avoided with the new app state store by having smarter indices (so data 
is read from the store sorted in the desired order) and by keeping statistics 
about metrics pre-calculated (instead of re-doing that on every page access).

Then only the tasks on the current page (100 items by default) need to actually 
be loaded. This also saves a lot on memory usage, no just CPU time.


> Speed up Stage page
> ---
>
> Key: SPARK-20657
> URL: https://issues.apache.org/jira/browse/SPARK-20657
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>
> The Stage page in the UI is very slow when a large number of tasks exist 
> (tens of thousands). The new work being done in SPARK-18085 makes that worse, 
> since it adds potential disk access to the mix.
> A lot of the slowness is because the code loads all the tasks in memory then 
> sorts a really large list, and does a lot of calculations on all the data; 
> both can be avoided with the new app state store by having smarter indices 
> (so data is read from the store sorted in the desired order) and by keeping 
> statistics about metrics pre-calculated (instead of re-doing that on every 
> page access).
> Then only the tasks on the current page (100 items by default) need to 
> actually be loaded. This also saves a lot on memory usage, not just CPU time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21590) Structured Streaming window start time should support negative values to adjust time zone

2017-08-01 Thread Kevin Zhang (JIRA)
Kevin Zhang created SPARK-21590:
---

 Summary: Structured Streaming window start time should support 
negative values to adjust time zone
 Key: SPARK-21590
 URL: https://issues.apache.org/jira/browse/SPARK-21590
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.2.0, 2.1.0, 2.0.1, 2.0.0
 Environment: spark 2.2.0
Reporter: Kevin Zhang


I want to calculate (unique) daily access count using structured streaming 
(2.2.0). 
Now strut streaming' s window with 1 day duration starts at 
00:00:00 UTC and ends at 23:59:59 UTC each day, but my local timezone is CST 
(UTC + 8 hours) and I
want date boundaries to be 00:00:00 CST (that is 00:00:00 UTC - 8). 

In Flink I can set the window offset to -8 hours to make it, but here in struct 
streaming if I set the start time (same as the offset in Flink) to -8 or any 
other negative values, I will get the following error:

{code:shell}
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot 
resolve 'timewindow(timestamp, 864, 864, -288)' due to 
data type mismatch: The start time (-288) must be greater than or equal 
to 0.;;
{code}

because the time window checks the input parameters to guarantee each value is 
greater than or equal to 0.

So I'm thinking about whether we can remove the limit that the start time 
cannot be negative?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21593) Fix broken configuration page

2017-08-01 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-21593.
---
   Resolution: Fixed
Fix Version/s: 2.3.0
   2.2.1

Issue resolved by pull request 18793
[https://github.com/apache/spark/pull/18793]

> Fix broken configuration page
> -
>
> Key: SPARK-21593
> URL: https://issues.apache.org/jira/browse/SPARK-21593
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.0
> Environment: Chrome/Firefox
>Reporter: Artur Sukhenko
>Priority: Minor
> Fix For: 2.2.1, 2.3.0
>
> Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg
>
>
> Latest configuration page for Spark 2.2.0 has broken menu list and named 
> anchors.
> Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
> with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]
> Or try this link [Configuration # Dynamic 
> Allocation|https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation]
>  with should open Dynamic Allocation part of the page, but doesn't.
> !dyn_latest.jpg!
> 
> !dyn_211.jpg!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21573) Tests failing with run-tests.py SyntaxError occasionally in Jenkins

2017-08-01 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109440#comment-16109440
 ] 

Sean Owen commented on SPARK-21573:
---

FWIW I noticed it fails on 4 and 7 but not on 6. Go ahead with that PR when 
ready :) it's failing about half the builds right now unfortunately.

> Tests failing with run-tests.py SyntaxError occasionally in Jenkins
> ---
>
> Key: SPARK-21573
> URL: https://issues.apache.org/jira/browse/SPARK-21573
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> It looks default {{python}} in the path at few places such as 
> {{./dev/run-tests}} use Python 2.6 in Jenkins and it fails to execute 
> {{run-tests.py}}:
> {code}
> python2.6 run-tests.py
>   File "run-tests.py", line 124
> {m: set(m.dependencies).intersection(modules_to_test) for m in 
> modules_to_test}, sort=True)
> ^
> SyntaxError: invalid syntax
> {code}
> It looks there are quite some places to fix to support Python 2.6 in 
> {{run-tests.py}} and related Python scripts.
> We might just try to set Python 2.7 in few other scripts running this if 
> available.
> Please also see 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Tests-failing-with-run-tests-py-SyntaxError-td22030.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21592) Skip maven-compiler-plugin main and test compilations in Maven build

2017-08-01 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-21592.
---
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 18750
[https://github.com/apache/spark/pull/18750]

> Skip maven-compiler-plugin main and test compilations in Maven build
> 
>
> Key: SPARK-21592
> URL: https://issues.apache.org/jira/browse/SPARK-21592
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.2.0
>Reporter: Grzegorz Slowikowski
>Priority: Minor
>  Labels: maven
> Fix For: 2.3.0
>
>
> `scala-maven-plugin` in incremental mode compiles Scala and Java classes. 
> There is no need to execute `maven-compiler-plugin` goals to compile (in fact 
> recompile) Java.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21593) Fix broken configuration page

2017-08-01 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-21593:
-

Assignee: Sean Owen

> Fix broken configuration page
> -
>
> Key: SPARK-21593
> URL: https://issues.apache.org/jira/browse/SPARK-21593
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.0
> Environment: Chrome/Firefox
>Reporter: Artur Sukhenko
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 2.2.1, 2.3.0
>
> Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg
>
>
> Latest configuration page for Spark 2.2.0 has broken menu list and named 
> anchors.
> Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
> with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]
> Or try this link [Configuration # Dynamic 
> Allocation|https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation]
>  with should open Dynamic Allocation part of the page, but doesn't.
> !dyn_latest.jpg!
> 
> !dyn_211.jpg!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21592) Skip maven-compiler-plugin main and test compilations in Maven build

2017-08-01 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-21592:
-

Assignee: Grzegorz Slowikowski

> Skip maven-compiler-plugin main and test compilations in Maven build
> 
>
> Key: SPARK-21592
> URL: https://issues.apache.org/jira/browse/SPARK-21592
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.2.0
>Reporter: Grzegorz Slowikowski
>Assignee: Grzegorz Slowikowski
>Priority: Minor
>  Labels: maven
> Fix For: 2.3.0
>
>
> `scala-maven-plugin` in incremental mode compiles Scala and Java classes. 
> There is no need to execute `maven-compiler-plugin` goals to compile (in fact 
> recompile) Java.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21596) Audit the places calling HDFSMetadataLog.get

2017-08-01 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-21596:


 Summary: Audit the places calling HDFSMetadataLog.get
 Key: SPARK-21596
 URL: https://issues.apache.org/jira/browse/SPARK-21596
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.1.1
Reporter: Shixiong Zhu


When I was investigating a flaky test, I realized that many places don't check 
the return value of `HDFSMetadataLog.get(batchId: Long): Option[T]`. When a 
batch is supposed to be there, the caller just ignores None rather than 
throwing an error. If some bug causes a query doesn't generate a batch metadata 
file, this behavior will hide it and allow the query continuing to run.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21573) Tests failing with run-tests.py SyntaxError occasionally in Jenkins

2017-08-01 Thread shane knapp (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109465#comment-16109465
 ] 

shane knapp commented on SPARK-21573:
-

i'm chatting w/@joshrosen @ 130pm to discuss how to proceed.  besides
this PR, i have a couple of other options i'd like to bounce off of
him first.



> Tests failing with run-tests.py SyntaxError occasionally in Jenkins
> ---
>
> Key: SPARK-21573
> URL: https://issues.apache.org/jira/browse/SPARK-21573
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> It looks default {{python}} in the path at few places such as 
> {{./dev/run-tests}} use Python 2.6 in Jenkins and it fails to execute 
> {{run-tests.py}}:
> {code}
> python2.6 run-tests.py
>   File "run-tests.py", line 124
> {m: set(m.dependencies).intersection(modules_to_test) for m in 
> modules_to_test}, sort=True)
> ^
> SyntaxError: invalid syntax
> {code}
> It looks there are quite some places to fix to support Python 2.6 in 
> {{run-tests.py}} and related Python scripts.
> We might just try to set Python 2.7 in few other scripts running this if 
> available.
> Please also see 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Tests-failing-with-run-tests-py-SyntaxError-td22030.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21589) Add documents about unsupported functions in Hive UDF/UDTF/UDAF

2017-08-01 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21589.
-
   Resolution: Fixed
 Assignee: Takeshi Yamamuro
Fix Version/s: 2.3.0

> Add documents about unsupported functions in Hive UDF/UDTF/UDAF
> ---
>
> Key: SPARK-21589
> URL: https://issues.apache.org/jira/browse/SPARK-21589
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 2.2.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21579) dropTempView has a critical BUG

2017-08-01 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21579:

Issue Type: Improvement  (was: Bug)

> dropTempView has a critical BUG
> ---
>
> Key: SPARK-21579
> URL: https://issues.apache.org/jira/browse/SPARK-21579
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: ant_nebula
>Priority: Critical
> Attachments: screenshot-1.png
>
>
> when I dropTempView dwd_table1 only, sub table dwd_table2 also disappear from 
> http://127.0.0.1:4040/storage/. 
> It affect version 2.1.1 and 2.2.0, 2.1.0 is ok for this problem.
> {code:java}
> val spark = 
> SparkSession.builder.master("local").appName("sparkTest").getOrCreate()
> val rows = Seq(Row("p1", 30), Row("p2", 20), Row("p3", 25), Row("p4", 10), 
> Row("p5", 40), Row("p6", 15))
> val schema = new StructType().add(StructField("name", 
> StringType)).add(StructField("age", IntegerType))
> val rowRDD = spark.sparkContext.parallelize(rows, 3)
> val df = spark.createDataFrame(rowRDD, schema)
> df.createOrReplaceTempView("ods_table")
> spark.sql("cache table ods_table")
> spark.sql("cache table dwd_table1 as select * from ods_table where age>=25")
> spark.sql("cache table dwd_table2 as select * from dwd_table1 where 
> name='p1'")
> spark.catalog.dropTempView("dwd_table1")
> //spark.catalog.dropTempView("ods_table")
> spark.sql("select * from dwd_table2").show()
> {code}
> It will keep ods_table1 in memory, although it will not been used anymore. It 
> waste memory, especially when my service diagram much more complex
> !screenshot-1.png!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21579) dropTempView has a critical BUG

2017-08-01 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108440#comment-16108440
 ] 

Xiao Li commented on SPARK-21579:
-

Like what I said in the PR. Correctness is more important for us. The cached 
plan will be reused when we build any other plans. Thus, users might see the 
out-of-dated results.

To achieve what you want, it requires introducing new concept, like 
materialized views, which will not be used by our plan matching in query 
execution.

> dropTempView has a critical BUG
> ---
>
> Key: SPARK-21579
> URL: https://issues.apache.org/jira/browse/SPARK-21579
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: ant_nebula
>Priority: Critical
> Attachments: screenshot-1.png
>
>
> when I dropTempView dwd_table1 only, sub table dwd_table2 also disappear from 
> http://127.0.0.1:4040/storage/. 
> It affect version 2.1.1 and 2.2.0, 2.1.0 is ok for this problem.
> {code:java}
> val spark = 
> SparkSession.builder.master("local").appName("sparkTest").getOrCreate()
> val rows = Seq(Row("p1", 30), Row("p2", 20), Row("p3", 25), Row("p4", 10), 
> Row("p5", 40), Row("p6", 15))
> val schema = new StructType().add(StructField("name", 
> StringType)).add(StructField("age", IntegerType))
> val rowRDD = spark.sparkContext.parallelize(rows, 3)
> val df = spark.createDataFrame(rowRDD, schema)
> df.createOrReplaceTempView("ods_table")
> spark.sql("cache table ods_table")
> spark.sql("cache table dwd_table1 as select * from ods_table where age>=25")
> spark.sql("cache table dwd_table2 as select * from dwd_table1 where 
> name='p1'")
> spark.catalog.dropTempView("dwd_table1")
> //spark.catalog.dropTempView("ods_table")
> spark.sql("select * from dwd_table2").show()
> {code}
> It will keep ods_table1 in memory, although it will not been used anymore. It 
> waste memory, especially when my service diagram much more complex
> !screenshot-1.png!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2