[jira] [Commented] (SPARK-18976) in standlone mode,executor expired by HeartbeanReceiver that still take up cores but no tasks assigned to

2016-12-25 Thread liujianhui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777821#comment-15777821
 ] 

liujianhui commented on SPARK-18976:


thanks for your attention, I found the root cause, the reason is same with the 
issue https://issues.apache.org/jira/browse/SPARK-18994, The master found the 
worker's heartbeat expired and then remove it, but the executor on that worker 
is always alive,  since the standby Master becoming the active, this executor 
will reported to the new master along with the WorkerSchedulerStateResponse, 
the executor will add to corresponding app's executors list

> in standlone mode,executor expired by HeartbeanReceiver that still take up 
> cores but no tasks assigned to 
> --
>
> Key: SPARK-18976
> URL: https://issues.apache.org/jira/browse/SPARK-18976
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.6.1
> Environment: jdk1.8.0_77 Red Hat 4.4.7-11
>Reporter: liujianhui
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> h2. scene
> when executor expired by HeartbeatReceiver in driver, driver will mark that 
> executor as not live, task scheduler will not assign tasks to that executor, 
> but that executor's status will always be running and take up cores, the 
> executor 18 was expired and no task running, the task time far less than the 
> normal executor 142, but in app page, the executor is running
> !screenshot-1.png!
> !screenshot-2.png!
> !screenshot-3.png!
> h2.process:
> # exeuctor expired by HearbeatReceiver because the last heartbeat execeed the 
> executor timeout
> # executor will be removed in CoarseGrainedSchdulerBackend.killExecutors, so 
> that executor will marked as dead, it will not scheduled as offer since now 
> because it in executorsPendingToRemove
> # status of that executor is running because the CoarseGrainedExecutorBackend 
> processor is also exist and it register block manager to the driver every 
> 10s, log as 
> {code}
> 16/12/22 17:04:26 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:26 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:26 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:26 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:26 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:36 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:36 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:36 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:36 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:36 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:46 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:46 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:46 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:46 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:46 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:56 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:56 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:56 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:56 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:56 INFO BlockManager: Reporting 0 blocks to the master. 
> {code}
> h2. resolve 
> when the register times exceed some threshold(e.g. 10), the executor should 
> exit as zero 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18997) Recommended upgrade libthrift to 0.9.3

2016-12-25 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777815#comment-15777815
 ] 

Liang-Chi Hsieh commented on SPARK-18997:
-

I've checked the dependency and seems there is no conflict.

{code}
+-org.apache.thrift:libfb303:0.9.3
| +-org.apache.thrift:libthrift:0.9.3
|   +-org.apache.httpcomponents:httpclient:4.4.1 (evicted by: 4.5.2)
|   +-org.apache.httpcomponents:httpclient:4.5.2
|   | +-commons-codec:commons-codec:1.10
|   | +-commons-codec:commons-codec:1.9 (evicted by: 1.10)
|   | +-commons-logging:commons-logging:1.2
|   | +-org.apache.httpcomponents:httpcore:4.4.4
|   |
|   +-org.apache.httpcomponents:httpcore:4.4.1 (evicted by: 4.4.4)
|   +-org.apache.httpcomponents:httpcore:4.4.4
|
+-org.apache.thrift:libthrift:0.9.3
| +-org.apache.httpcomponents:httpclient:4.4.1 (evicted by: 4.5.2)
| +-org.apache.httpcomponents:httpclient:4.5.2
| | +-commons-codec:commons-codec:1.10
| | +-commons-codec:commons-codec:1.9 (evicted by: 1.10)
| | +-commons-logging:commons-logging:1.2
| | +-org.apache.httpcomponents:httpcore:4.4.4
| |
| +-org.apache.httpcomponents:httpcore:4.4.1 (evicted by: 4.4.4)
| +-org.apache.httpcomponents:httpcore:4.4.4
|
{code}

[~srowen] What do you think about this? Do we want to upgrade this?

> Recommended upgrade libthrift  to 0.9.3
> ---
>
> Key: SPARK-18997
> URL: https://issues.apache.org/jira/browse/SPARK-18997
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: meiyoula
>Priority: Critical
>
> libthrift 0.9.2 has a serious security vulnerability:CVE-2015-3254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17755) Master may ask a worker to launch an executor before the worker actually got the response of registration

2016-12-25 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-17755.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

> Master may ask a worker to launch an executor before the worker actually got 
> the response of registration
> -
>
> Key: SPARK-17755
> URL: https://issues.apache.org/jira/browse/SPARK-17755
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Yin Huai
>Assignee: Shixiong Zhu
> Fix For: 2.2.0
>
>
> I somehow saw a failed test {{org.apache.spark.DistributedSuite.caching in 
> memory, serialized, replicated}}. Its log shows that Spark master asked the 
> worker to launch an executor before the worker actually got the response of 
> registration. So, the master knew that the worker had been registered. But, 
> the worker did not know if it self had been registered. 
> {code}
> 16/09/30 14:53:53.681 dispatcher-event-loop-0 INFO Master: Registering worker 
> localhost:38262 with 1 cores, 1024.0 MB RAM
> 16/09/30 14:53:53.681 dispatcher-event-loop-0 INFO Master: Launching executor 
> app-20160930145353-/1 on worker worker-20160930145353-localhost-38262
> 16/09/30 14:53:53.682 dispatcher-event-loop-3 INFO 
> StandaloneAppClient$ClientEndpoint: Executor added: app-20160930145353-/1 
> on worker-20160930145353-localhost-38262 (localhost:38262) with 1 cores
> 16/09/30 14:53:53.683 dispatcher-event-loop-3 INFO 
> StandaloneSchedulerBackend: Granted executor ID app-20160930145353-/1 on 
> hostPort localhost:38262 with 1 cores, 1024.0 MB RAM
> 16/09/30 14:53:53.683 dispatcher-event-loop-0 WARN Worker: Invalid Master 
> (spark://localhost:46460) attempted to launch executor.
> 16/09/30 14:53:53.687 worker-register-master-threadpool-0 INFO Worker: 
> Successfully registered with master spark://localhost:46460
> {code}
> Then, seems the worker did not launch any executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10872) Derby error (XSDB6) when creating new HiveContext after restarting SparkContext

2016-12-25 Thread karthik G S (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777808#comment-15777808
 ] 

karthik G S  commented on SPARK-10872:
--

So, what is the possible fix on this issue? 


> Derby error (XSDB6) when creating new HiveContext after restarting 
> SparkContext
> ---
>
> Key: SPARK-10872
> URL: https://issues.apache.org/jira/browse/SPARK-10872
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.4.0, 1.4.1, 1.5.0
>Reporter: Dmytro Bielievtsov
>
> Starting from spark 1.4.0 (works well on 1.3.1), the following code fails 
> with "XSDB6: Another instance of Derby may have already booted the database 
> ~/metastore_db":
> {code:python}
> from pyspark import SparkContext, HiveContext
> sc = SparkContext("local[*]", "app1")
> sql = HiveContext(sc)
> sql.createDataFrame([[1]]).collect()
> sc.stop()
> sc = SparkContext("local[*]", "app2")
> sql = HiveContext(sc)
> sql.createDataFrame([[1]]).collect()  # Py4J error
> {code}
> This is related to [#SPARK-9539], and I intend to restart spark context 
> several times for isolated jobs to prevent cache cluttering and GC errors.
> Here's a larger part of the full error trace:
> {noformat}
> Failed to start database 'metastore_db' with class loader 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@13015ec0, see 
> the next exception for details.
> org.datanucleus.exceptions.NucleusDataStoreException: Failed to start 
> database 'metastore_db' with class loader 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@13015ec0, see 
> the next exception for details.
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:516)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:298)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
>   at 
> org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
>   at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
>   at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
>   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
>   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:57)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
>   at 
> org.apache.h

[jira] [Commented] (SPARK-18996) Spark SQL support for post hooks

2016-12-25 Thread Atul Payapilly (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1599#comment-1599
 ] 

Atul Payapilly commented on SPARK-18996:


Yep that's right, that's exactly what I'm looking for.

> Spark SQL support for post hooks
> 
>
> Key: SPARK-18996
> URL: https://issues.apache.org/jira/browse/SPARK-18996
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Atul Payapilly
>
> Spark SQL used to support Hive execution hooks(incidentally as a side effect 
> of using the Hive Driver in earlier versions) but no longer does so. More 
> details at: https://issues.apache.org/jira/browse/SPARK-18879.
> The post hook functionality where it is possible to determine which 
> partitions were written to is extremely useful, Eg: Suppose the data is 
> written and then exported to an external system, without post hooks, this is 
> not possible to determine.
> This feature request is to provide this capability, ideally using the Hive 
> exec hooks API if possible so users don't need to rewrite their hooks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18996) Spark SQL support for post hooks

2016-12-25 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1587#comment-1587
 ] 

Xiao Li commented on SPARK-18996:
-

Are you requesting a Hive feature like 
https://issues.apache.org/jira/browse/HIVE-854?

> Spark SQL support for post hooks
> 
>
> Key: SPARK-18996
> URL: https://issues.apache.org/jira/browse/SPARK-18996
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Atul Payapilly
>
> Spark SQL used to support Hive execution hooks(incidentally as a side effect 
> of using the Hive Driver in earlier versions) but no longer does so. More 
> details at: https://issues.apache.org/jira/browse/SPARK-18879.
> The post hook functionality where it is possible to determine which 
> partitions were written to is extremely useful, Eg: Suppose the data is 
> written and then exported to an external system, without post hooks, this is 
> not possible to determine.
> This feature request is to provide this capability, ideally using the Hive 
> exec hooks API if possible so users don't need to rewrite their hooks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18999) simplify Literal codegen

2016-12-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18999:


Assignee: Wenchen Fan  (was: Apache Spark)

> simplify Literal codegen
> 
>
> Key: SPARK-18999
> URL: https://issues.apache.org/jira/browse/SPARK-18999
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18999) simplify Literal codegen

2016-12-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1579#comment-1579
 ] 

Apache Spark commented on SPARK-18999:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/16402

> simplify Literal codegen
> 
>
> Key: SPARK-18999
> URL: https://issues.apache.org/jira/browse/SPARK-18999
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18999) simplify Literal codegen

2016-12-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18999:


Assignee: Apache Spark  (was: Wenchen Fan)

> simplify Literal codegen
> 
>
> Key: SPARK-18999
> URL: https://issues.apache.org/jira/browse/SPARK-18999
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18999) simplify Literal codegen

2016-12-25 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-18999:
---

 Summary: simplify Literal codegen
 Key: SPARK-18999
 URL: https://issues.apache.org/jira/browse/SPARK-18999
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan
Assignee: Wenchen Fan
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18675) CTAS for hive serde table should work for all hive versions

2016-12-25 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-18675:

Fix Version/s: 2.0.3

> CTAS for hive serde table should work for all hive versions
> ---
>
> Key: SPARK-18675
> URL: https://issues.apache.org/jira/browse/SPARK-18675
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.3, 2.1.1, 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18237) hive.exec.stagingdir have no effect in spark2.0.1

2016-12-25 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-18237:

Fix Version/s: 2.0.3

> hive.exec.stagingdir have no effect in spark2.0.1
> -
>
> Key: SPARK-18237
> URL: https://issues.apache.org/jira/browse/SPARK-18237
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: ClassNotFoundExp
>Assignee: ClassNotFoundExp
> Fix For: 2.0.3, 2.1.0
>
>
> hive.exec.stagingdir have no effect in spark2.0.1,
> this relevant to https://issues.apache.org/jira/browse/SPARK-11021



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18703) Insertion/CTAS against Hive Tables: Staging Directories and Data Files Not Dropped Until Normal Termination of JVM

2016-12-25 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-18703:

Fix Version/s: 2.0.3

> Insertion/CTAS against Hive Tables: Staging Directories and Data Files Not 
> Dropped Until Normal Termination of JVM
> --
>
> Key: SPARK-18703
> URL: https://issues.apache.org/jira/browse/SPARK-18703
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Critical
> Fix For: 2.0.3, 2.1.1, 2.2.0
>
>
> Below are the files/directories generated for three inserts againsts a Hive 
> table:
> {noformat}
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/._SUCCESS.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/.part-0.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/_SUCCESS
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/part-0
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/._SUCCESS.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/.part-0.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/_SUCCESS
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/part-0
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1/._SUCCESS.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1/.part-0.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1/_SUCCESS
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1/part-0
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.part-0.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/part-0
> {noformat}
> The first 18 files are temporary. We do not drop it until the end of JVM 
> termination. If JVM does not appropriately terminate, these temporary 
> files/directories will not be dropped.
> Only the last two files are needed, as shown below.
> {noformat}
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.part-0.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/part-0
> {noformat}
> Ideally, 

[jira] [Assigned] (SPARK-18998) Add a cbo conf to switch between default statistics and cbo estimated statistics

2016-12-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18998:


Assignee: Apache Spark

> Add a cbo conf to switch between default statistics and cbo estimated 
> statistics
> 
>
> Key: SPARK-18998
> URL: https://issues.apache.org/jira/browse/SPARK-18998
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Zhenhua Wang
>Assignee: Apache Spark
>
> We need a cbo configuration to switch between default stats and estimated 
> stats. We also need a new statistics method in LogicalPlan with conf as its 
> parameter, in order to pass the cbo switch and other estimation related 
> configurations in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18998) Add a cbo conf to switch between default statistics and cbo estimated statistics

2016-12-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18998:


Assignee: (was: Apache Spark)

> Add a cbo conf to switch between default statistics and cbo estimated 
> statistics
> 
>
> Key: SPARK-18998
> URL: https://issues.apache.org/jira/browse/SPARK-18998
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Zhenhua Wang
>
> We need a cbo configuration to switch between default stats and estimated 
> stats. We also need a new statistics method in LogicalPlan with conf as its 
> parameter, in order to pass the cbo switch and other estimation related 
> configurations in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18998) Add a cbo conf to switch between default statistics and cbo estimated statistics

2016-12-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777679#comment-15777679
 ] 

Apache Spark commented on SPARK-18998:
--

User 'wzhfy' has created a pull request for this issue:
https://github.com/apache/spark/pull/16401

> Add a cbo conf to switch between default statistics and cbo estimated 
> statistics
> 
>
> Key: SPARK-18998
> URL: https://issues.apache.org/jira/browse/SPARK-18998
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Zhenhua Wang
>
> We need a cbo configuration to switch between default stats and estimated 
> stats. We also need a new statistics method in LogicalPlan with conf as its 
> parameter, in order to pass the cbo switch and other estimation related 
> configurations in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18998) Add a cbo conf to switch between default statistics and cbo estimated statistics

2016-12-25 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-18998:


 Summary: Add a cbo conf to switch between default statistics and 
cbo estimated statistics
 Key: SPARK-18998
 URL: https://issues.apache.org/jira/browse/SPARK-18998
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.2.0
Reporter: Zhenhua Wang


We need a cbo configuration to switch between default stats and estimated 
stats. We also need a new statistics method in LogicalPlan with conf as its 
parameter, in order to pass the cbo switch and other estimation related 
configurations in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13857) Feature parity for ALS ML with MLLIB

2016-12-25 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777650#comment-15777650
 ] 

Debasish Das edited comment on SPARK-13857 at 12/26/16 5:57 AM:


item->item and user->user was done in an old PR I had...if there is interest I 
can resend it...nice to see how it compares with approximate nearest neighbor 
work from uber:
https://github.com/apache/spark/pull/6213


was (Author: debasish83):
item->item and user->user was done in an old PR I had...if there is interested 
I can resend it...nice to see how it compares with approximate nearest neighbor 
work from uber:
https://github.com/apache/spark/pull/6213

> Feature parity for ALS ML with MLLIB
> 
>
> Key: SPARK-13857
> URL: https://issues.apache.org/jira/browse/SPARK-13857
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Nick Pentreath
>Assignee: Nick Pentreath
>
> Currently {{mllib.recommendation.MatrixFactorizationModel}} has methods 
> {{recommendProducts/recommendUsers}} for recommending top K to a given user / 
> item, as well as {{recommendProductsForUsers/recommendUsersForProducts}} to 
> recommend top K across all users/items.
> Additionally, SPARK-10802 is for adding the ability to do 
> {{recommendProductsForUsers}} for a subset of users (or vice versa).
> Look at exposing or porting (as appropriate) these methods to ALS in ML. 
> Investigate if efficiency can be improved at the same time (see SPARK-11968).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13857) Feature parity for ALS ML with MLLIB

2016-12-25 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777650#comment-15777650
 ] 

Debasish Das commented on SPARK-13857:
--

item->item and user->user was done in an old PR I had...if there is interested 
I can resend it...nice to see how it compares with approximate nearest neighbor 
work from uber:
https://github.com/apache/spark/pull/6213

> Feature parity for ALS ML with MLLIB
> 
>
> Key: SPARK-13857
> URL: https://issues.apache.org/jira/browse/SPARK-13857
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Nick Pentreath
>Assignee: Nick Pentreath
>
> Currently {{mllib.recommendation.MatrixFactorizationModel}} has methods 
> {{recommendProducts/recommendUsers}} for recommending top K to a given user / 
> item, as well as {{recommendProductsForUsers/recommendUsersForProducts}} to 
> recommend top K across all users/items.
> Additionally, SPARK-10802 is for adding the ability to do 
> {{recommendProductsForUsers}} for a subset of users (or vice versa).
> Look at exposing or porting (as appropriate) these methods to ALS in ML. 
> Investigate if efficiency can be improved at the same time (see SPARK-11968).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18941) "Drop Table" command doesn't delete the directory of the managed Hive table when users specifying locations

2016-12-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777600#comment-15777600
 ] 

Apache Spark commented on SPARK-18941:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/16400

> "Drop Table" command doesn't delete the directory of the managed Hive table 
> when users specifying locations
> ---
>
> Key: SPARK-18941
> URL: https://issues.apache.org/jira/browse/SPARK-18941
> Project: Spark
>  Issue Type: Documentation
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: luat
>
> Spark thrift server, Spark 2.0.2, The "drop table" command doesn't delete the 
> directory associated with the Hive table (not EXTERNAL table) from the HDFS 
> file system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18941) "Drop Table" command doesn't delete the directory of the managed Hive table when users specifying locations

2016-12-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18941:


Assignee: Apache Spark

> "Drop Table" command doesn't delete the directory of the managed Hive table 
> when users specifying locations
> ---
>
> Key: SPARK-18941
> URL: https://issues.apache.org/jira/browse/SPARK-18941
> Project: Spark
>  Issue Type: Documentation
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: luat
>Assignee: Apache Spark
>
> Spark thrift server, Spark 2.0.2, The "drop table" command doesn't delete the 
> directory associated with the Hive table (not EXTERNAL table) from the HDFS 
> file system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18941) "Drop Table" command doesn't delete the directory of the managed Hive table when users specifying locations

2016-12-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18941:


Assignee: (was: Apache Spark)

> "Drop Table" command doesn't delete the directory of the managed Hive table 
> when users specifying locations
> ---
>
> Key: SPARK-18941
> URL: https://issues.apache.org/jira/browse/SPARK-18941
> Project: Spark
>  Issue Type: Documentation
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: luat
>
> Spark thrift server, Spark 2.0.2, The "drop table" command doesn't delete the 
> directory associated with the Hive table (not EXTERNAL table) from the HDFS 
> file system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12613) Elimination of Outer Join by Parent Join Condition

2016-12-25 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-12613.
-
Resolution: Duplicate

> Elimination of Outer Join by Parent Join Condition
> --
>
> Key: SPARK-12613
> URL: https://issues.apache.org/jira/browse/SPARK-12613
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer, SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Priority: Critical
>
> Given an outer join is involved in another join (called parent join), when 
> the join type of the parent join is inner, left-semi, left-outer and 
> right-outer, checking if the join condition of the parent join satisfies the 
> following two conditions:
>  1) there exist null filtering predicates against the columns in the 
> null-supplying side of parent join.
>  2) these columns are from the child join.
>  If having such join predicates, execute the elimination rules:
>  - full outer -> inner if both sides of the child join have such predicates
>  - left outer -> inner if the right side of the child join has such predicates
>  - right outer -> inner if the left side of the child join has such predicates
>  - full outer -> left outer if only the left side of the child join has such 
> predicates
>  - full outer -> right outer if only the right side of the child join has 
> such predicates



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18941) "Drop Table" command doesn't delete the directory of the managed Hive table when users specifying locations

2016-12-25 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777553#comment-15777553
 ] 

Dongjoon Hyun commented on SPARK-18941:
---

Yep. I'll make a PR soon.

> "Drop Table" command doesn't delete the directory of the managed Hive table 
> when users specifying locations
> ---
>
> Key: SPARK-18941
> URL: https://issues.apache.org/jira/browse/SPARK-18941
> Project: Spark
>  Issue Type: Documentation
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: luat
>
> Spark thrift server, Spark 2.0.2, The "drop table" command doesn't delete the 
> directory associated with the Hive table (not EXTERNAL table) from the HDFS 
> file system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18931) Create empty staging directory in partitioned table on insert

2016-12-25 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777543#comment-15777543
 ] 

Xiao Li commented on SPARK-18931:
-

Yeah. The PR https://github.com/apache/spark/pull/16399 backports the fix to 
Spark 2.0. 

> Create empty staging directory in partitioned table on insert
> -
>
> Key: SPARK-18931
> URL: https://issues.apache.org/jira/browse/SPARK-18931
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Egor Pahomov
>
> CREATE TABLE temp.test_partitioning_4 (
>   num string
>  ) 
> PARTITIONED BY (
>   day string)
>   stored as parquet
> On every 
> INSERT INTO TABLE temp.test_partitioning_4 PARTITION (day)
> select day, count(*) as num from 
> hss.session where year=2016 and month=4 
> group by day
> new directory 
> ".hive-staging_hive_2016-12-19_15-55-11_298_3412488541559534475-4" created on 
> HDFS.  It's big issue, because I insert every day and bunch of empty dirs on 
> HDFS is very bad for HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18941) "Drop Table" command doesn't delete the directory of the managed Hive table when users specifying locations

2016-12-25 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-18941:

Summary: "Drop Table" command doesn't delete the directory of the managed 
Hive table when users specifying locations  (was: Spark thrift server, Spark 
2.0.2, The "drop table" command doesn't delete the directory associated with 
the Hive table (not EXTERNAL table) from the HDFS file system)

> "Drop Table" command doesn't delete the directory of the managed Hive table 
> when users specifying locations
> ---
>
> Key: SPARK-18941
> URL: https://issues.apache.org/jira/browse/SPARK-18941
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: luat
>
> Spark thrift server, Spark 2.0.2, The "drop table" command doesn't delete the 
> directory associated with the Hive table (not EXTERNAL table) from the HDFS 
> file system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18941) "Drop Table" command doesn't delete the directory of the managed Hive table when users specifying locations

2016-12-25 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-18941:

Issue Type: Documentation  (was: Bug)

> "Drop Table" command doesn't delete the directory of the managed Hive table 
> when users specifying locations
> ---
>
> Key: SPARK-18941
> URL: https://issues.apache.org/jira/browse/SPARK-18941
> Project: Spark
>  Issue Type: Documentation
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: luat
>
> Spark thrift server, Spark 2.0.2, The "drop table" command doesn't delete the 
> directory associated with the Hive table (not EXTERNAL table) from the HDFS 
> file system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18941) Spark thrift server, Spark 2.0.2, The "drop table" command doesn't delete the directory associated with the Hive table (not EXTERNAL table) from the HDFS file system

2016-12-25 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777493#comment-15777493
 ] 

Xiao Li commented on SPARK-18941:
-

We should document the behavior change. Maybe [~dongjoon] can submit a PR? 

> Spark thrift server, Spark 2.0.2, The "drop table" command doesn't delete the 
> directory associated with the Hive table (not EXTERNAL table) from the HDFS 
> file system
> -
>
> Key: SPARK-18941
> URL: https://issues.apache.org/jira/browse/SPARK-18941
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.0.2
>Reporter: luat
>
> Spark thrift server, Spark 2.0.2, The "drop table" command doesn't delete the 
> directory associated with the Hive table (not EXTERNAL table) from the HDFS 
> file system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18997) Recommended upgrade libthrift to 0.9.3

2016-12-25 Thread meiyoula (JIRA)
meiyoula created SPARK-18997:


 Summary: Recommended upgrade libthrift  to 0.9.3
 Key: SPARK-18997
 URL: https://issues.apache.org/jira/browse/SPARK-18997
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: meiyoula
Priority: Critical


libthrift 0.9.2 has a serious security vulnerability:CVE-2015-3254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18703) Insertion/CTAS against Hive Tables: Staging Directories and Data Files Not Dropped Until Normal Termination of JVM

2016-12-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777332#comment-15777332
 ] 

Apache Spark commented on SPARK-18703:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/16399

> Insertion/CTAS against Hive Tables: Staging Directories and Data Files Not 
> Dropped Until Normal Termination of JVM
> --
>
> Key: SPARK-18703
> URL: https://issues.apache.org/jira/browse/SPARK-18703
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Critical
> Fix For: 2.1.1, 2.2.0
>
>
> Below are the files/directories generated for three inserts againsts a Hive 
> table:
> {noformat}
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/._SUCCESS.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/.part-0.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/_SUCCESS
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/part-0
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/._SUCCESS.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/.part-0.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/_SUCCESS
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/part-0
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1/._SUCCESS.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1/.part-0.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1/_SUCCESS
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1/part-0
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.part-0.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/part-0
> {noformat}
> The first 18 files are temporary. We do not drop it until the end of JVM 
> termination. If JVM does not appropriately terminate, these temporary 
> files/directories will not be dropped.
> Only the last two files are needed, as shown below.
> {noformat}
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.part-0.crc
> /pr

[jira] [Commented] (SPARK-18237) hive.exec.stagingdir have no effect in spark2.0.1

2016-12-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777331#comment-15777331
 ] 

Apache Spark commented on SPARK-18237:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/16399

> hive.exec.stagingdir have no effect in spark2.0.1
> -
>
> Key: SPARK-18237
> URL: https://issues.apache.org/jira/browse/SPARK-18237
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: ClassNotFoundExp
>Assignee: ClassNotFoundExp
> Fix For: 2.1.0
>
>
> hive.exec.stagingdir have no effect in spark2.0.1,
> this relevant to https://issues.apache.org/jira/browse/SPARK-11021



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18675) CTAS for hive serde table should work for all hive versions

2016-12-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777333#comment-15777333
 ] 

Apache Spark commented on SPARK-18675:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/16399

> CTAS for hive serde table should work for all hive versions
> ---
>
> Key: SPARK-18675
> URL: https://issues.apache.org/jira/browse/SPARK-18675
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.1.1, 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18842) De-duplicate paths in classpaths in processes for local-cluster mode to work around the length limitation on Windows

2016-12-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18842:


Assignee: Apache Spark  (was: Hyukjin Kwon)

> De-duplicate paths in classpaths in processes for local-cluster mode to work 
> around the length limitation on Windows
> 
>
> Key: SPARK-18842
> URL: https://issues.apache.org/jira/browse/SPARK-18842
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
> Fix For: 2.2.0
>
>
> Currently, some tests are being failed and hanging on Windows due to this 
> problem. For the reason in SPARK-18718, some tests using {{local-cluster}} 
> mode were disabled on Windows due to the length limitation by paths given to 
> classpaths.
> The limitation seems roughly 32K (see 
> https://blogs.msdn.microsoft.com/oldnewthing/20031210-00/?p=41553/ and 
> https://support.thoughtworks.com/hc/en-us/articles/213248526-Getting-around-maximum-command-line-length-is-32767-characters-on-Windows)
>  but executors were being launched with the command such as 
> https://gist.github.com/HyukjinKwon/5bc81061c250d4af5a180869b59d42ea in 
> (only) tests.
> This length is roughly 40K due to the class paths. However, it seems there 
> are duplicates more than half. So, if we de-duplicate this paths, it is 
> reduced to roughly 20K.
> Maybe, we should consider as some more paths are added in the future but it 
> seems better than disabling all the tests for now with minimised changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18842) De-duplicate paths in classpaths in processes for local-cluster mode to work around the length limitation on Windows

2016-12-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18842:


Assignee: Hyukjin Kwon  (was: Apache Spark)

> De-duplicate paths in classpaths in processes for local-cluster mode to work 
> around the length limitation on Windows
> 
>
> Key: SPARK-18842
> URL: https://issues.apache.org/jira/browse/SPARK-18842
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
> Fix For: 2.2.0
>
>
> Currently, some tests are being failed and hanging on Windows due to this 
> problem. For the reason in SPARK-18718, some tests using {{local-cluster}} 
> mode were disabled on Windows due to the length limitation by paths given to 
> classpaths.
> The limitation seems roughly 32K (see 
> https://blogs.msdn.microsoft.com/oldnewthing/20031210-00/?p=41553/ and 
> https://support.thoughtworks.com/hc/en-us/articles/213248526-Getting-around-maximum-command-line-length-is-32767-characters-on-Windows)
>  but executors were being launched with the command such as 
> https://gist.github.com/HyukjinKwon/5bc81061c250d4af5a180869b59d42ea in 
> (only) tests.
> This length is roughly 40K due to the class paths. However, it seems there 
> are duplicates more than half. So, if we de-duplicate this paths, it is 
> reduced to roughly 20K.
> Maybe, we should consider as some more paths are added in the future but it 
> seems better than disabling all the tests for now with minimised changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18842) De-duplicate paths in classpaths in processes for local-cluster mode to work around the length limitation on Windows

2016-12-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15776477#comment-15776477
 ] 

Apache Spark commented on SPARK-18842:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/16398

> De-duplicate paths in classpaths in processes for local-cluster mode to work 
> around the length limitation on Windows
> 
>
> Key: SPARK-18842
> URL: https://issues.apache.org/jira/browse/SPARK-18842
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
> Fix For: 2.2.0
>
>
> Currently, some tests are being failed and hanging on Windows due to this 
> problem. For the reason in SPARK-18718, some tests using {{local-cluster}} 
> mode were disabled on Windows due to the length limitation by paths given to 
> classpaths.
> The limitation seems roughly 32K (see 
> https://blogs.msdn.microsoft.com/oldnewthing/20031210-00/?p=41553/ and 
> https://support.thoughtworks.com/hc/en-us/articles/213248526-Getting-around-maximum-command-line-length-is-32767-characters-on-Windows)
>  but executors were being launched with the command such as 
> https://gist.github.com/HyukjinKwon/5bc81061c250d4af5a180869b59d42ea in 
> (only) tests.
> This length is roughly 40K due to the class paths. However, it seems there 
> are duplicates more than half. So, if we de-duplicate this paths, it is 
> reduced to roughly 20K.
> Maybe, we should consider as some more paths are added in the future but it 
> seems better than disabling all the tests for now with minimised changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18842) De-duplicate paths in classpaths in processes for local-cluster mode to work around the length limitation on Windows

2016-12-25 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15776461#comment-15776461
 ] 

Hyukjin Kwon edited comment on SPARK-18842 at 12/25/16 1:19 PM:


{{ReplSuite}} hangs on Windows due to this problem. The reason is, it uses the 
paths as URLs in the tests whereas some added afterward are normal local paths. 
So, many paths are duplicated because normal local paths and URLs are mixed. 
This length is up to 40K which hits the length limitation problem on Windows.

Please refer the tests here - 
https://ci.appveyor.com/project/spark-test/spark/build/395-find-path-issues

and the command line here - 
https://gist.github.com/HyukjinKwon/46af7946c9a5fd4c6fc70a8a0aba1beb


was (Author: hyukjin.kwon):
{{ReplSuite}} hangs on Windows due to this problem. The reason is, it converts 
the paths into URL in the tests. So, many paths are duplicated because normal 
local paths and URLs are mixed. 

Please refer the tests here - 
https://ci.appveyor.com/project/spark-test/spark/build/395-find-path-issues

and the command line here - 
https://gist.github.com/HyukjinKwon/46af7946c9a5fd4c6fc70a8a0aba1beb

> De-duplicate paths in classpaths in processes for local-cluster mode to work 
> around the length limitation on Windows
> 
>
> Key: SPARK-18842
> URL: https://issues.apache.org/jira/browse/SPARK-18842
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
> Fix For: 2.2.0
>
>
> Currently, some tests are being failed and hanging on Windows due to this 
> problem. For the reason in SPARK-18718, some tests using {{local-cluster}} 
> mode were disabled on Windows due to the length limitation by paths given to 
> classpaths.
> The limitation seems roughly 32K (see 
> https://blogs.msdn.microsoft.com/oldnewthing/20031210-00/?p=41553/ and 
> https://support.thoughtworks.com/hc/en-us/articles/213248526-Getting-around-maximum-command-line-length-is-32767-characters-on-Windows)
>  but executors were being launched with the command such as 
> https://gist.github.com/HyukjinKwon/5bc81061c250d4af5a180869b59d42ea in 
> (only) tests.
> This length is roughly 40K due to the class paths. However, it seems there 
> are duplicates more than half. So, if we de-duplicate this paths, it is 
> reduced to roughly 20K.
> Maybe, we should consider as some more paths are added in the future but it 
> seems better than disabling all the tests for now with minimised changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-18842) De-duplicate paths in classpaths in processes for local-cluster mode to work around the length limitation on Windows

2016-12-25 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-18842:
--

{{ReplSuite}} hangs on Windows due to this problem. The reason is, it converts 
the paths into URL in the tests. So, many paths are duplicated because normal 
local paths and URLs are mixed. 

Please refer the tests here - 
https://ci.appveyor.com/project/spark-test/spark/build/395-find-path-issues

and the command line here - 
https://gist.github.com/HyukjinKwon/46af7946c9a5fd4c6fc70a8a0aba1beb

> De-duplicate paths in classpaths in processes for local-cluster mode to work 
> around the length limitation on Windows
> 
>
> Key: SPARK-18842
> URL: https://issues.apache.org/jira/browse/SPARK-18842
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
> Fix For: 2.2.0
>
>
> Currently, some tests are being failed and hanging on Windows due to this 
> problem. For the reason in SPARK-18718, some tests using {{local-cluster}} 
> mode were disabled on Windows due to the length limitation by paths given to 
> classpaths.
> The limitation seems roughly 32K (see 
> https://blogs.msdn.microsoft.com/oldnewthing/20031210-00/?p=41553/ and 
> https://support.thoughtworks.com/hc/en-us/articles/213248526-Getting-around-maximum-command-line-length-is-32767-characters-on-Windows)
>  but executors were being launched with the command such as 
> https://gist.github.com/HyukjinKwon/5bc81061c250d4af5a180869b59d42ea in 
> (only) tests.
> This length is roughly 40K due to the class paths. However, it seems there 
> are duplicates more than half. So, if we de-duplicate this paths, it is 
> reduced to roughly 20K.
> Maybe, we should consider as some more paths are added in the future but it 
> seems better than disabling all the tests for now with minimised changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18922) Fix more resource-closing-related and path-related test failures in identified ones on Windows

2016-12-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18922:


Assignee: Hyukjin Kwon  (was: Apache Spark)

> Fix more resource-closing-related and path-related test failures in 
> identified ones on Windows
> --
>
> Key: SPARK-18922
> URL: https://issues.apache.org/jira/browse/SPARK-18922
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.2.0
>
>
> There are more instances that are failed on Windows as below:
> - {{LauncherBackendSuite}}:
> {code}
> - local: launcher handle *** FAILED *** (30 seconds, 120 milliseconds)
>   The code passed to eventually never returned normally. Attempted 283 times 
> over 30.0960053 seconds. Last failure message: The reference was null. 
> (LauncherBackendSuite.scala:56)
>   org.scalatest.exceptions.TestFailedDueToTimeoutException:
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
> - standalone/client: launcher handle *** FAILED *** (30 seconds, 47 
> milliseconds)
>   The code passed to eventually never returned normally. Attempted 282 times 
> over 30.03798710002 seconds. Last failure message: The reference was 
> null. (LauncherBackendSuite.scala:56)
>   org.scalatest.exceptions.TestFailedDueToTimeoutException:
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
> {code}
> - {{SQLQuerySuite}}:
> {code}
> - specifying database name for a temporary table is not allowed *** FAILED 
> *** (125 milliseconds)
>   org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/C:projectsspark  arget mpspark-1f4471ab-aac0-4239-ae35-833d54b37e52;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
> {code}
> - {{JsonSuite}}:
> {code}
> - Loading a JSON dataset from a text file with SQL *** FAILED *** (94 
> milliseconds)
>   org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/C:projectsspark  arget mpspark-c918a8b7-fc09-433c-b9d0-36c0f78ae918;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
> {code}
> - {{StateStoreSuite}}:
> {code}
> - SPARK-18342: commit fails when rename fails *** FAILED *** (16 milliseconds)
>   java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: 
> StateStoreSuite29777261fs://C:%5Cprojects%5Cspark%5Ctarget%5Ctmp%5Cspark-ef349862-7281-4963-aaf3-add0d670a4ad%5C?-2218c2f8-2cf6-4f80-9cdf-96354e8246a77685899733421033312/0
>   at org.apache.hadoop.fs.Path.initialize(Path.java:206)
>   at org.apache.hadoop.fs.Path.(Path.java:116)
>   at org.apache.hadoop.fs.Path.(Path.java:89)
>   ...
>   Cause: java.net.URISyntaxException: Relative path in absolute URI: 
> StateStoreSuite29777261fs://C:%5Cprojects%5Cspark%5Ctarget%5Ctmp%5Cspark-ef349862-7281-4963-aaf3-add0d670a4ad%5C?-2218c2f8-2cf6-4f80-9cdf-96354e8246a77685899733421033312/0
>   at java.net.URI.checkPath(URI.java:1823)
>   at java.net.URI.(URI.java:745)
>   at org.apache.hadoop.fs.Path.initialize(Path.java:203)
> {code}
> - {{HDFSMetadataLogSuite}}:
> {code}
> - FileManager: FileContextManager *** FAILED *** (94 milliseconds)
>   java.io.IOException: Failed to delete: 
> C:\projects\spark\target\tmp\spark-415bb0bd-396b-444d-be82-04599e025f21
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTempDir(SQLTestUtils.scala:127)
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLogSuite.withTempDir(HDFSMetadataLogSuite.scala:38)
> - FileManager: FileSystemManager *** FAILED *** (78 milliseconds)
>   java.io.IOException: Failed to delete: 
> C:\projects\spark\target\tmp\spark-ef8222cd-85aa-47c0-a396-bc7979e15088
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTempDir(SQLTestUtils.scala:127)
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLogSuite.withTempDir(HDFSMetadataLogSuite.scala:38)
> {code}
> Please refer, for full logs, 
> https://ci.appveyor.com/project/spark-test/spark/build/283-tmp-test-base



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-

[jira] [Commented] (SPARK-18922) Fix more resource-closing-related and path-related test failures in identified ones on Windows

2016-12-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15776431#comment-15776431
 ] 

Apache Spark commented on SPARK-18922:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/16397

> Fix more resource-closing-related and path-related test failures in 
> identified ones on Windows
> --
>
> Key: SPARK-18922
> URL: https://issues.apache.org/jira/browse/SPARK-18922
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.2.0
>
>
> There are more instances that are failed on Windows as below:
> - {{LauncherBackendSuite}}:
> {code}
> - local: launcher handle *** FAILED *** (30 seconds, 120 milliseconds)
>   The code passed to eventually never returned normally. Attempted 283 times 
> over 30.0960053 seconds. Last failure message: The reference was null. 
> (LauncherBackendSuite.scala:56)
>   org.scalatest.exceptions.TestFailedDueToTimeoutException:
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
> - standalone/client: launcher handle *** FAILED *** (30 seconds, 47 
> milliseconds)
>   The code passed to eventually never returned normally. Attempted 282 times 
> over 30.03798710002 seconds. Last failure message: The reference was 
> null. (LauncherBackendSuite.scala:56)
>   org.scalatest.exceptions.TestFailedDueToTimeoutException:
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
> {code}
> - {{SQLQuerySuite}}:
> {code}
> - specifying database name for a temporary table is not allowed *** FAILED 
> *** (125 milliseconds)
>   org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/C:projectsspark  arget mpspark-1f4471ab-aac0-4239-ae35-833d54b37e52;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
> {code}
> - {{JsonSuite}}:
> {code}
> - Loading a JSON dataset from a text file with SQL *** FAILED *** (94 
> milliseconds)
>   org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/C:projectsspark  arget mpspark-c918a8b7-fc09-433c-b9d0-36c0f78ae918;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
> {code}
> - {{StateStoreSuite}}:
> {code}
> - SPARK-18342: commit fails when rename fails *** FAILED *** (16 milliseconds)
>   java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: 
> StateStoreSuite29777261fs://C:%5Cprojects%5Cspark%5Ctarget%5Ctmp%5Cspark-ef349862-7281-4963-aaf3-add0d670a4ad%5C?-2218c2f8-2cf6-4f80-9cdf-96354e8246a77685899733421033312/0
>   at org.apache.hadoop.fs.Path.initialize(Path.java:206)
>   at org.apache.hadoop.fs.Path.(Path.java:116)
>   at org.apache.hadoop.fs.Path.(Path.java:89)
>   ...
>   Cause: java.net.URISyntaxException: Relative path in absolute URI: 
> StateStoreSuite29777261fs://C:%5Cprojects%5Cspark%5Ctarget%5Ctmp%5Cspark-ef349862-7281-4963-aaf3-add0d670a4ad%5C?-2218c2f8-2cf6-4f80-9cdf-96354e8246a77685899733421033312/0
>   at java.net.URI.checkPath(URI.java:1823)
>   at java.net.URI.(URI.java:745)
>   at org.apache.hadoop.fs.Path.initialize(Path.java:203)
> {code}
> - {{HDFSMetadataLogSuite}}:
> {code}
> - FileManager: FileContextManager *** FAILED *** (94 milliseconds)
>   java.io.IOException: Failed to delete: 
> C:\projects\spark\target\tmp\spark-415bb0bd-396b-444d-be82-04599e025f21
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTempDir(SQLTestUtils.scala:127)
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLogSuite.withTempDir(HDFSMetadataLogSuite.scala:38)
> - FileManager: FileSystemManager *** FAILED *** (78 milliseconds)
>   java.io.IOException: Failed to delete: 
> C:\projects\spark\target\tmp\spark-ef8222cd-85aa-47c0-a396-bc7979e15088
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTempDir(SQLTestUtils.scala:127)
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLogSuite.withTempDir(HDFSMetadataLogSuite.scala:38)
> {code}
> Please refer, for full logs, 
> https://ci.appveyor.com/project/spark-t

[jira] [Assigned] (SPARK-18922) Fix more resource-closing-related and path-related test failures in identified ones on Windows

2016-12-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18922:


Assignee: Apache Spark  (was: Hyukjin Kwon)

> Fix more resource-closing-related and path-related test failures in 
> identified ones on Windows
> --
>
> Key: SPARK-18922
> URL: https://issues.apache.org/jira/browse/SPARK-18922
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 2.2.0
>
>
> There are more instances that are failed on Windows as below:
> - {{LauncherBackendSuite}}:
> {code}
> - local: launcher handle *** FAILED *** (30 seconds, 120 milliseconds)
>   The code passed to eventually never returned normally. Attempted 283 times 
> over 30.0960053 seconds. Last failure message: The reference was null. 
> (LauncherBackendSuite.scala:56)
>   org.scalatest.exceptions.TestFailedDueToTimeoutException:
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
> - standalone/client: launcher handle *** FAILED *** (30 seconds, 47 
> milliseconds)
>   The code passed to eventually never returned normally. Attempted 282 times 
> over 30.03798710002 seconds. Last failure message: The reference was 
> null. (LauncherBackendSuite.scala:56)
>   org.scalatest.exceptions.TestFailedDueToTimeoutException:
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
> {code}
> - {{SQLQuerySuite}}:
> {code}
> - specifying database name for a temporary table is not allowed *** FAILED 
> *** (125 milliseconds)
>   org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/C:projectsspark  arget mpspark-1f4471ab-aac0-4239-ae35-833d54b37e52;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
> {code}
> - {{JsonSuite}}:
> {code}
> - Loading a JSON dataset from a text file with SQL *** FAILED *** (94 
> milliseconds)
>   org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/C:projectsspark  arget mpspark-c918a8b7-fc09-433c-b9d0-36c0f78ae918;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
> {code}
> - {{StateStoreSuite}}:
> {code}
> - SPARK-18342: commit fails when rename fails *** FAILED *** (16 milliseconds)
>   java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: 
> StateStoreSuite29777261fs://C:%5Cprojects%5Cspark%5Ctarget%5Ctmp%5Cspark-ef349862-7281-4963-aaf3-add0d670a4ad%5C?-2218c2f8-2cf6-4f80-9cdf-96354e8246a77685899733421033312/0
>   at org.apache.hadoop.fs.Path.initialize(Path.java:206)
>   at org.apache.hadoop.fs.Path.(Path.java:116)
>   at org.apache.hadoop.fs.Path.(Path.java:89)
>   ...
>   Cause: java.net.URISyntaxException: Relative path in absolute URI: 
> StateStoreSuite29777261fs://C:%5Cprojects%5Cspark%5Ctarget%5Ctmp%5Cspark-ef349862-7281-4963-aaf3-add0d670a4ad%5C?-2218c2f8-2cf6-4f80-9cdf-96354e8246a77685899733421033312/0
>   at java.net.URI.checkPath(URI.java:1823)
>   at java.net.URI.(URI.java:745)
>   at org.apache.hadoop.fs.Path.initialize(Path.java:203)
> {code}
> - {{HDFSMetadataLogSuite}}:
> {code}
> - FileManager: FileContextManager *** FAILED *** (94 milliseconds)
>   java.io.IOException: Failed to delete: 
> C:\projects\spark\target\tmp\spark-415bb0bd-396b-444d-be82-04599e025f21
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTempDir(SQLTestUtils.scala:127)
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLogSuite.withTempDir(HDFSMetadataLogSuite.scala:38)
> - FileManager: FileSystemManager *** FAILED *** (78 milliseconds)
>   java.io.IOException: Failed to delete: 
> C:\projects\spark\target\tmp\spark-ef8222cd-85aa-47c0-a396-bc7979e15088
>   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTempDir(SQLTestUtils.scala:127)
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLogSuite.withTempDir(HDFSMetadataLogSuite.scala:38)
> {code}
> Please refer, for full logs, 
> https://ci.appveyor.com/project/spark-test/spark/build/283-tmp-test-base



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-

[jira] [Reopened] (SPARK-18922) Fix more resource-closing-related and path-related test failures in identified ones on Windows

2016-12-25 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-18922:
--

I am reopening this as I found some more errors as below:

{code}
ColumnExpressionSuite:
- input_file_name, input_file_block_start, input_file_block_length - 
FileScanRDD *** FAILED *** (187 milliseconds)
  
"file:///C:/projects/spark/target/tmp/spark-0b21b963-6cfa-411c-8d6f-e6a5e1e73bce/part-1-c083a03a-e55e-4b05-9073-451de352d006.snappy.parquet"
 did not contain 
"C:\projects\spark\target\tmp\spark-0b21b963-6cfa-411c-8d6f-e6a5e1e73bce" 
(ColumnExpressionSuite.scala:545)
  
- input_file_name, input_file_block_start, input_file_block_length - HadoopRDD 
*** FAILED *** (172 milliseconds)
  
"file:/C:/projects/spark/target/tmp/spark-5d0afa94-7c2f-463b-9db9-2e8403e2bc5f/part-0-f6530138-9ad3-466d-ab46-0eeb6f85ed0b.txt"
 did not contain 
"C:\projects\spark\target\tmp\spark-5d0afa94-7c2f-463b-9db9-2e8403e2bc5f" 
(ColumnExpressionSuite.scala:569)

- input_file_name, input_file_block_start, input_file_block_length - 
NewHadoopRDD *** FAILED *** (156 milliseconds)
  
"file:/C:/projects/spark/target/tmp/spark-a894c7df-c74d-4d19-82a2-a04744cb3766/part-0-29674e3f-3fcf-4327-9b04-4dab1d46338d.txt"
 did not contain 
"C:\projects\spark\target\tmp\spark-a894c7df-c74d-4d19-82a2-a04744cb3766" 
(ColumnExpressionSuite.scala:598)

DataStreamReaderWriterSuite:
- source metadataPath *** FAILED *** (62 milliseconds)
  org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: Argument(s) 
are different! Wanted:
streamSourceProvider.createSource(
org.apache.spark.sql.SQLContext@3b04133b,

"C:\projects\spark\target\tmp\streaming.metadata-b05db6ae-c8dc-4ce4-b0d9-1eb8c84876c0/sources/0",
None,
"org.apache.spark.sql.streaming.test",
Map()
);
-> at 
org.apache.spark.sql.streaming.test.DataStreamReaderWriterSuite$$anonfun$12.apply$mcV$sp(DataStreamReaderWriterSuite.scala:374)
Actual invocation has different arguments:
streamSourceProvider.createSource(
org.apache.spark.sql.SQLContext@3b04133b,

"/C:/projects/spark/target/tmp/streaming.metadata-b05db6ae-c8dc-4ce4-b0d9-1eb8c84876c0/sources/0",
None,
"org.apache.spark.sql.streaming.test",
Map()
);


GlobalTempViewSuite:
- CREATE GLOBAL TEMP VIEW USING *** FAILED *** (110 milliseconds)
  org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/C:projectsspark  arget mpspark-960398ba-a0a1-45f6-a59a-d98533f9f519;


CreateTableAsSelectSuite:
- CREATE TABLE USING AS SELECT *** FAILED *** (0 milliseconds)
  java.lang.IllegalArgumentException: Can not create a Path from an empty string

- create a table, drop it and create another one with the same name *** FAILED 
*** (16 milliseconds)
  java.lang.IllegalArgumentException: Can not create a Path from an empty string

- create table using as select - with partitioned by *** FAILED *** (0 
milliseconds)
  java.lang.IllegalArgumentException: Can not create a Path from an empty string

- create table using as select - with non-zero buckets *** FAILED *** (0 
milliseconds)
  java.lang.IllegalArgumentException: Can not create a Path from an empty string


HiveMetadataCacheSuite:
- partitioned table is cached when partition pruning is true *** FAILED *** 
(532 milliseconds)
  org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:java.lang.IllegalArgumentException: Can not create a Path 
from an empty string);

- partitioned table is cached when partition pruning is false *** FAILED *** 
(297 milliseconds)
  org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:java.lang.IllegalArgumentException: Can not create a Path 
from an empty string);


MultiDatabaseSuite:
- createExternalTable() to non-default database - with USE *** FAILED *** (954 
milliseconds)
  org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/C:projectsspark  arget mpspark-0839d9a7-5e29-467a-9e3e-3e4cd618ee09;

- createExternalTable() to non-default database - without USE *** FAILED *** 
(500 milliseconds)
  org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/C:projectsspark  arget mpspark-c7e24d73-1d8f-45e8-ab7d-53a83087aec3;

 - invalid database name and table names *** FAILED *** (31 milliseconds)
   "Path does not exist: file:/C:projectsspark  arget 
mpspark-15a2a494-3483-4876-80e5-ec396e704b77;" did not contain "`t:a` is not a 
valid name for tables/databases. Valid names only contain alphabet characters, 
numbers and _." (MultiDatabaseSuite.scala:296)
   

OrcQuerySuite:
 - SPARK-8501: Avoids discovery schema from empty ORC files *** FAILED *** (15 
milliseconds)
   org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:java.lang.IllegalArgumentException: Can not create a Path 
from an empty string);

 - Verify th