[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to project of a plan not a relation size

2023-12-29 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Description: 
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes  of a project (columns 
selected in join), not a relation size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter compared to projection instead 
of broadcasted table size seems quite risky feature: there will be more 
broadcasted relations but more chances to get OOM on the driver too.

The solution is to disable spark.sql.autoBroadcastJoinThreshold and set hints 
on really small relations, but in that case autoBroadcastJoinThreshold seems 
useless.  It would be more usefull to have autoBroadcastJoinThreshold which 
campres to relations size and have predicted memory usage on the driver.

 

Original task and test when autobroadcast compared to relation totalSize:

https://issues.apache.org/jira/browse/SPARK-2393

[https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]

 

Task and PR where autoBroadcastJoinThreshold started to be compared to project 
of a plan instead of relations:

https://issues.apache.org/jira/browse/SPARK-13329

[https://github.com/apache/spark/pull/11210]

 

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]

  was:
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes  of a project (columns 
selected in join), not a relation size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter compared to projection instead 
of broadcasted table size seems quite risky feature: there will be more 
broadcasted relations but more chances to get OOM on the driver too.

 

Original task and test when autobroadcast compared to relation totalSize:

https://issues.apache.org/jira/browse/SPARK-2393

[https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]

 

Task and PR where autoBroadcastJoinThreshold started to be compared to project 
of a plan instead of relations:

https://issues.apache.org/jira/browse/SPARK-13329

[https://github.com/apache/spark/pull/11210]

 

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]


> autoBroadcastJoinThreshold compared to project of a plan not a relation size
> 
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> In fact Spark compares plan.statistics.sizeInBytes  of a project (columns 
> selected in join), not a relation size.
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]
> Join can select only a few columns and sizeInBytes will be lesser than 
> autoBroadcastJoinThreshold, but broadcasted table can be huge and it is 
> loaded entirely into drivers memory which can lead to OOM.
> spark.sql.autoBroadcastJoinThreshold parameter compared to projection instead 
> of broadcasted table size seems quite risky feature: there will be more 
> broadcasted relations but 

[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to project of a plan not a relation size

2023-12-29 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Description: 
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes  of a project (columns 
selected in join), not a relation size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter compared to projection instead 
of broadcasted table size seems quite risky feature: there will be more 
broadcasted relations but more chances to get OOM on the driver too.

 

Original task and test when autobroadcast compared to relation totalSize:

https://issues.apache.org/jira/browse/SPARK-2393

[https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]

 

Task and PR where autoBroadcastJoinThreshold started to be compared to project 
of a plan instead of relations:

https://issues.apache.org/jira/browse/SPARK-13329

[https://github.com/apache/spark/pull/11210]

 

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]

  was:
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes  of a project (columns 
selected in join), not a relation size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

 

Original task and test when autobroadcast compared to relation totalSize:

https://issues.apache.org/jira/browse/SPARK-2393

[https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]

 

Task and PR where autoBroadcastJoinThreshold started to be compared to project 
of a plan instead of relations:

https://issues.apache.org/jira/browse/SPARK-13329

[https://github.com/apache/spark/pull/11210]

 

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]


> autoBroadcastJoinThreshold compared to project of a plan not a relation size
> 
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> In fact Spark compares plan.statistics.sizeInBytes  of a project (columns 
> selected in join), not a relation size.
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]
> Join can select only a few columns and sizeInBytes will be lesser than 
> autoBroadcastJoinThreshold, but broadcasted table can be huge and it is 
> loaded entirely into drivers memory which can lead to OOM.
> spark.sql.autoBroadcastJoinThreshold parameter compared to projection instead 
> of broadcasted table size seems quite risky feature: there will be more 
> broadcasted relations but more chances to get OOM on the driver too.
>  
> Original task and test when autobroadcast compared to relation totalSize:
> https://issues.apache.org/jira/browse/SPARK-2393
> [https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]
>  
> Task and PR where autoBroadcastJoinThreshold started to be compared to 
> project of a plan instead of relations:

[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to project of a plan not a relation size

2023-12-27 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Description: 
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes  of a project (columns 
selected in join), not a relation size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

 

Original task and test when autobroadcast compared to relation totalSize:

https://issues.apache.org/jira/browse/SPARK-2393

[https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]

 

Task and PR where autoBroadcastJoinThreshold started to be compared to project 
of a plan instead of relations:

https://issues.apache.org/jira/browse/SPARK-13329

[https://github.com/apache/spark/pull/11210]

 

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]

  was:
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes  of a project (columns 
selected in join), not a relation size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

 

Original task and test when autobroadcast compared to relation totalSize:

https://issues.apache.org/jira/browse/SPARK-2393

[https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]

 

Task and PR where autoBroadcastJoinThreshold sterted to be compared to project 
of a plan instead of relations:

https://issues.apache.org/jira/browse/SPARK-13329

[https://github.com/apache/spark/pull/11210]

 

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]


> autoBroadcastJoinThreshold compared to project of a plan not a relation size
> 
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> In fact Spark compares plan.statistics.sizeInBytes  of a project (columns 
> selected in join), not a relation size.
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]
> Join can select only a few columns and sizeInBytes will be lesser than 
> autoBroadcastJoinThreshold, but broadcasted table can be huge and it is 
> loaded entirely into drivers memory which can lead to OOM.
> spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
> compared to  broadcasted table size.
>  
> Original task and test when autobroadcast compared to relation totalSize:
> https://issues.apache.org/jira/browse/SPARK-2393
> [https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]
>  
> Task and PR where autoBroadcastJoinThreshold started to be compared to 
> project of a plan instead of relations:
> https://issues.apache.org/jira/browse/SPARK-13329
> [https://github.com/apache/spark/pull/11210]
>  
> Related topic on SO: 
> 

[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to project of a plan not a relation size

2023-12-27 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Description: 
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes  of a project (columns 
selected in join), not a relation size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

 

Original task and test when autobroadcast compared to relation totalSize:

https://issues.apache.org/jira/browse/SPARK-2393

[https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]

 

Task and PR where autoBroadcastJoinThreshold sterted to be compared to project 
of a plan instead of relations:

https://issues.apache.org/jira/browse/SPARK-13329

[https://github.com/apache/spark/pull/11210]

 

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]

  was:
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a relation size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

 

Original task and test when autobroadcast compared to relation totalSize:

https://issues.apache.org/jira/browse/SPARK-2393

[https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]

 

Task and PR where autoBroadcastJoinThreshold sterted to be compared to project 
of a plan instead of relations:

https://issues.apache.org/jira/browse/SPARK-13329

[https://github.com/apache/spark/pull/11210]

 

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]


> autoBroadcastJoinThreshold compared to project of a plan not a relation size
> 
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> In fact Spark compares plan.statistics.sizeInBytes  of a project (columns 
> selected in join), not a relation size.
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]
> Join can select only a few columns and sizeInBytes will be lesser than 
> autoBroadcastJoinThreshold, but broadcasted table can be huge and it is 
> loaded entirely into drivers memory which can lead to OOM.
> spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
> compared to  broadcasted table size.
>  
> Original task and test when autobroadcast compared to relation totalSize:
> https://issues.apache.org/jira/browse/SPARK-2393
> [https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]
>  
> Task and PR where autoBroadcastJoinThreshold sterted to be compared to 
> project of a plan instead of relations:
> https://issues.apache.org/jira/browse/SPARK-13329
> [https://github.com/apache/spark/pull/11210]
>  
> Related topic on SO: 
> 

[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to project of a plan not a relation size

2023-12-27 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Summary: autoBroadcastJoinThreshold compared to project of a plan not a 
relation size  (was: autoBroadcastJoinThreshold compared to plan.statistics not 
a table size)

> autoBroadcastJoinThreshold compared to project of a plan not a relation size
> 
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
> join, not a table size.
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]
> Join can select only a few columns and sizeInBytes will be lesser than 
> autoBroadcastJoinThreshold, but broadcasted table can be huge and it is 
> loaded entirely into drivers memory which can lead to OOM.
> spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
> compared to  broadcasted table size.
>  
> Original task and test when autobroadcast compared to relation totalSize:
> https://issues.apache.org/jira/browse/SPARK-2393
> [https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]
>  
> Task and PR where autoBroadcastJoinThreshold sterted to be compared to 
> project of a plan instead of relations:
> https://issues.apache.org/jira/browse/SPARK-13329
> [https://github.com/apache/spark/pull/11210]
>  
> Related topic on SO: 
> [https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to project of a plan not a relation size

2023-12-27 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Description: 
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a relation size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

 

Original task and test when autobroadcast compared to relation totalSize:

https://issues.apache.org/jira/browse/SPARK-2393

[https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]

 

Task and PR where autoBroadcastJoinThreshold sterted to be compared to project 
of a plan instead of relations:

https://issues.apache.org/jira/browse/SPARK-13329

[https://github.com/apache/spark/pull/11210]

 

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]

  was:
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

 

Original task and test when autobroadcast compared to relation totalSize:

https://issues.apache.org/jira/browse/SPARK-2393

[https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]

 

Task and PR where autoBroadcastJoinThreshold sterted to be compared to project 
of a plan instead of relations:

https://issues.apache.org/jira/browse/SPARK-13329

[https://github.com/apache/spark/pull/11210]

 

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]


> autoBroadcastJoinThreshold compared to project of a plan not a relation size
> 
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
> join, not a relation size.
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]
> Join can select only a few columns and sizeInBytes will be lesser than 
> autoBroadcastJoinThreshold, but broadcasted table can be huge and it is 
> loaded entirely into drivers memory which can lead to OOM.
> spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
> compared to  broadcasted table size.
>  
> Original task and test when autobroadcast compared to relation totalSize:
> https://issues.apache.org/jira/browse/SPARK-2393
> [https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]
>  
> Task and PR where autoBroadcastJoinThreshold sterted to be compared to 
> project of a plan instead of relations:
> https://issues.apache.org/jira/browse/SPARK-13329
> [https://github.com/apache/spark/pull/11210]
>  
> Related topic on SO: 
> [https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]



--

[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to plan.statistics not a table size

2023-12-27 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Description: 
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

 

Original task and test when autobroadcast compared to relation totalSize:

https://issues.apache.org/jira/browse/SPARK-2393

[https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]

 

Task and PR where autoBroadcastJoinThreshold sterted to be compared to project 
of a plan instead of relations:

https://issues.apache.org/jira/browse/SPARK-13329

[https://github.com/apache/spark/pull/11210]

 

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]

  was:
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

 

Original task and test when autobroadcast compared to relation totalSize:

https://issues.apache.org/jira/browse/SPARK-2393

[https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]

 

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]

 


> autoBroadcastJoinThreshold compared to plan.statistics not a table size
> ---
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
> join, not a table size.
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]
> Join can select only a few columns and sizeInBytes will be lesser than 
> autoBroadcastJoinThreshold, but broadcasted table can be huge and it is 
> loaded entirely into drivers memory which can lead to OOM.
> spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
> compared to  broadcasted table size.
>  
> Original task and test when autobroadcast compared to relation totalSize:
> https://issues.apache.org/jira/browse/SPARK-2393
> [https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]
>  
> Task and PR where autoBroadcastJoinThreshold sterted to be compared to 
> project of a plan instead of relations:
> https://issues.apache.org/jira/browse/SPARK-13329
> [https://github.com/apache/spark/pull/11210]
>  
> Related topic on SO: 
> [https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to plan.statistics not a table size

2023-12-27 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Description: 
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

 

Original task and test when autobroadcast compared to relation totalSize:

https://issues.apache.org/jira/browse/SPARK-2393

[https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]

 

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]

 

  was:
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]


> autoBroadcastJoinThreshold compared to plan.statistics not a table size
> ---
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
> join, not a table size.
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]
> Join can select only a few columns and sizeInBytes will be lesser than 
> autoBroadcastJoinThreshold, but broadcasted table can be huge and it is 
> loaded entirely into drivers memory which can lead to OOM.
> spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
> compared to  broadcasted table size.
>  
> Original task and test when autobroadcast compared to relation totalSize:
> https://issues.apache.org/jira/browse/SPARK-2393
> [https://github.com/apache/spark/pull/1238/files#diff-00485e6cae519f81adca5ceee66227c6eae35db709619d505468f8765175ac18R39]
>  
> Related topic on SO: 
> [https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to plan.statistics not a table size

2023-12-26 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Description: 
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and it is loaded 
entirely into drivers memory which can lead to OOM.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]

  was:
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and leads to OOM 
on driver.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]


> autoBroadcastJoinThreshold compared to plan.statistics not a table size
> ---
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
> join, not a table size.
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]
> Join can select only a few columns and sizeInBytes will be lesser than 
> autoBroadcastJoinThreshold, but broadcasted table can be huge and it is 
> loaded entirely into drivers memory which can lead to OOM.
> spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
> compared to  broadcasted table size.
> Related topic on SO: 
> [https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to plan.statistics not a table size

2023-12-26 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Description: 
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and leads to OOM 
on driver.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]

  was:
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns in join and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and leads to OOM 
on driver.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]


> autoBroadcastJoinThreshold compared to plan.statistics not a table size
> ---
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
> join, not a table size.
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]
> Join can select only a few columns and sizeInBytes will be lesser than 
> autoBroadcastJoinThreshold, but broadcasted table can be huge and leads to 
> OOM on driver.
> spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
> compared to  broadcasted table size.
> Related topic on SO: 
> [https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to plan.statistics not a table size

2023-12-26 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Description: 
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

Join can select only a few columns in join and sizeInBytes will be lesser than 
autoBroadcastJoinThreshold, but broadcasted table can be huge and leads to OOM 
on driver.

spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]

  was:
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

The broadcasted table can be huge and leads to OOM on driver, so 
spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]


> autoBroadcastJoinThreshold compared to plan.statistics not a table size
> ---
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
> join, not a table size.
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]
> Join can select only a few columns in join and sizeInBytes will be lesser 
> than autoBroadcastJoinThreshold, but broadcasted table can be huge and leads 
> to OOM on driver.
> spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
> compared to  broadcasted table size.
> Related topic on SO: 
> [https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to plan.statistics not a table size

2023-12-26 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Description: 
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]

The broadcasted table can be huge and leads to OOM on driver, so 
spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]

  was:
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]The
 broadcasted table can be huge and leads to OOM on driver, so 
spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]


> autoBroadcastJoinThreshold compared to plan.statistics not a table size
> ---
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
> join, not a table size.
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]
> The broadcasted table can be huge and leads to OOM on driver, so 
> spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
> compared to  broadcasted table size.
> Related topic on SO: 
> [https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to plan.statistics not a table size

2023-12-26 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Description: 
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.

In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.

[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]The
 broadcasted table can be huge and leads to OOM on driver, so 
spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]

  was:
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.
In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.
The broadcasted table can be huge and leads to OOM on driver, so 
spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]


> autoBroadcastJoinThreshold compared to plan.statistics not a table size
> ---
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
> join, not a table size.
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368]The
>  broadcasted table can be huge and leads to OOM on driver, so 
> spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
> compared to  broadcasted table size.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> Related topic on SO: 
> [https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to plan.statistics not a table size

2023-12-26 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Description: 
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.
In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.
The broadcasted table can be huge and leads to OOM on driver, so 
spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table size.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

Related topic on SO: 
[https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]

  was:
>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.
In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.
The broadcasted table can be huge and leads to OOM on driver, so 
spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table sizes.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

Related topic on SO: 
https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s


> autoBroadcastJoinThreshold compared to plan.statistics not a table size
> ---
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
> join, not a table size.
> The broadcasted table can be huge and leads to OOM on driver, so 
> spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
> compared to  broadcasted table size.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> Related topic on SO: 
> [https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46516) autoBroadcastJoinThreshold compared to plan.statistics not a table size

2023-12-26 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-46516:
--
Issue Type: Bug  (was: Documentation)

> autoBroadcastJoinThreshold compared to plan.statistics not a table size
> ---
>
> Key: SPARK-46516
> URL: https://issues.apache.org/jira/browse/SPARK-46516
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Guram Savinov
>Priority: Major
>
> From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
> size in bytes for a table that will be broadcasted to all worker nodes when 
> performing a join.
> In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
> join, not a table size.
> The broadcasted table can be huge and leads to OOM on driver, so 
> spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
> compared to  broadcasted table size.
> [https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]
> Related topic on SO: 
> [https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46516) autoBroadcastJoinThreshold compared to plan.statistics not a table size

2023-12-26 Thread Guram Savinov (Jira)
Guram Savinov created SPARK-46516:
-

 Summary: autoBroadcastJoinThreshold compared to plan.statistics 
not a table size
 Key: SPARK-46516
 URL: https://issues.apache.org/jira/browse/SPARK-46516
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 3.1.1
Reporter: Guram Savinov


>From the docs: spark.sql.autoBroadcastJoinThreshold - Configures the maximum 
>size in bytes for a table that will be broadcasted to all worker nodes when 
>performing a join.
In fact Spark compares plan.statistics.sizeInBytes for columns selected in 
join, not a table size.
The broadcasted table can be huge and leads to OOM on driver, so 
spark.sql.autoBroadcastJoinThreshold parameter seems useless when its not 
compared to  broadcasted table sizes.

[https://spark.apache.org/docs/3.5.0/configuration.html#runtime-sql-configuration]

Related topic on SO: 
https://stackoverflow.com/questions/74435020/how-dataframe-count-selects-broadcasthashjoin-while-dataframe-show-selects-s



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-03 Thread Guram Savinov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029619#comment-17029619
 ] 

Guram Savinov commented on SPARK-30701:
---

Ok, let's go to Hadoop project: 
https://issues.apache.org/jira/browse/HADOOP-16837

> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Windows 10
> Winutils 2.7.1: 
> [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]
> Oracle JavaSE 8
> SparkSQL 2.4.4 / Hadoop 2.6.5
> Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive
> Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive
>Reporter: Guram Savinov
>Priority: Major
>  Labels: WIndows, hive, unit-test
> Attachments: HadoopGroupTest.java
>
>
> Running SparkSQL local embedded unit tests on Win10, using winutils.
> Got warnings about 'hadoop chgrp'.
> See environment info.
> {code:bash}
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}
> Related info on SO: 
> https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive
> Seems like the problem is here: 
> hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShellPermissions.java:210



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028026#comment-17028026
 ] 

Guram Savinov edited comment on SPARK-30701 at 2/1/20 9:38 AM:
---

So the problem is: backslash character isn't included to allowedChars, see 
attached HadoopGroupTest.java
This is Hadoop issue, not about Spark.


was (Author: gsavinov):
So the problem is: backslash character isn't included to allowedChars, see 
attached HadoopGroupTest.java

> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Windows 10
> Winutils 2.7.1: 
> [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]
> Oracle JavaSE 8
> SparkSQL 2.4.4 / Hadoop 2.6.5
> Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive
> Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: WIndows, hive, unit-test
> Attachments: HadoopGroupTest.java
>
>
> Running SparkSQL local embedded unit tests on Win10, using winutils.
> Got warnings about 'hadoop chgrp'.
> See environment info.
> {code:bash}
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}
> Related info on SO: 
> https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive
> Seems like the problem is here: 
> hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShellPermissions.java:210



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-30701:
--
Environment: 
Windows 10

Winutils 2.7.1: 
[https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]

Oracle JavaSE 8

SparkSQL 2.4.4 / Hadoop 2.6.5

Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive

Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive

  was:
Windows 10

Winutils 2.7.1: 
[https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]

Oracle JavaSE 8

SparkSQL 2.4.4

Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive

Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive


> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Windows 10
> Winutils 2.7.1: 
> [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]
> Oracle JavaSE 8
> SparkSQL 2.4.4 / Hadoop 2.6.5
> Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive
> Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: WIndows, hive, unit-test
> Attachments: HadoopGroupTest.java
>
>
> Running SparkSQL local embedded unit tests on Win10, using winutils.
> Got warnings about 'hadoop chgrp'.
> See environment info.
> {code:bash}
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}
> Related info on SO: 
> https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive
> Seems like the problem is here: 
> hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShellPermissions.java:210



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028026#comment-17028026
 ] 

Guram Savinov commented on SPARK-30701:
---

So the problem is: backslash character isn't included to allowedChars, see 
attached HadoopGroupTest.java

> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Windows 10
> Winutils 2.7.1: 
> [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]
> Oracle JavaSE 8
> SparkSQL 2.4.4
> Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive
> Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: WIndows, hive, unit-test
> Attachments: HadoopGroupTest.java
>
>
> Running SparkSQL local embedded unit tests on Win10, using winutils.
> Got warnings about 'hadoop chgrp'.
> See environment info.
> {code:bash}
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}
> Related info on SO: 
> https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive
> Seems like the problem is here: 
> hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShellPermissions.java:210



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-30701:
--
Attachment: HadoopGroupTest.java

> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Windows 10
> Winutils 2.7.1: 
> [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]
> Oracle JavaSE 8
> SparkSQL 2.4.4
> Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive
> Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: WIndows, hive, unit-test
> Attachments: HadoopGroupTest.java
>
>
> Running SparkSQL local embedded unit tests on Win10, using winutils.
> Got warnings about 'hadoop chgrp'.
> See environment info.
> {code:bash}
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}
> Related info on SO: 
> https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive
> Seems like the problem is here: 
> hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShellPermissions.java:210



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-30701:
--
Description: 
Running SparkSQL local embedded unit tests on Win10, using winutils.

Got warnings about 'hadoop chgrp'.

See environment info.
{code:bash}
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}

Related info on SO: 
https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive
Seems like the problem is here: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShellPermissions.java:210

  was:
Running SparkSQL local embedded unit tests on Win10, using winutils.

Got warnings about 'hadoop chgrp'.

See environment info.
{code:bash}
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}

Related info on SO: 
https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive


> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Windows 10
> Winutils 2.7.1: 
> [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]
> Oracle JavaSE 8
> SparkSQL 2.4.4
> Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive
> Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: WIndows, hive, unit-test
>
> Running SparkSQL local embedded unit tests on Win10, using winutils.
> Got warnings about 'hadoop chgrp'.
> See environment info.
> {code:bash}
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}
> Related info on SO: 
> https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive
> Seems like the problem is here: 
> hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShellPermissions.java:210



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-30701:
--
Description: 
Running SparkSQL local embedded unit tests on Win10, using winutils.

Got warnings about 'hadoop chgrp'.

See environment info.
{code:bash}
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}

Related info on SO: 
https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive

  was:
Running SparkSQL local embedded unit tests on Win10, using winutils.

Got warnings about 'hadoop chgrp'.

See environment info.
{code:java}
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}

Related info on SO: 
https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive


> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Windows 10
> Winutils 2.7.1: 
> [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]
> Oracle JavaSE 8
> SparkSQL 2.4.4
> Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive
> Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: WIndows, hive, unit-test
>
> Running SparkSQL local embedded unit tests on Win10, using winutils.
> Got warnings about 'hadoop chgrp'.
> See environment info.
> {code:bash}
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}
> Related info on SO: 
> https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-30701:
--
Description: 
Running SparkSQL local embedded unit tests on Win10, using winutils.

Got warnings about 'hadoop chgrp'.

See environment info.
{code:java}
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}

Related info on SO: 
https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive

  was:
Running SparkSQL local embedded unit tests on Win10, using winutils.

Got warnings about 'hadoop chgrp'.

See environment info.
{code:java}
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}


> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Windows 10
> Winutils 2.7.1: 
> [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]
> Oracle JavaSE 8
> SparkSQL 2.4.4
> Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive
> Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: WIndows, hive, unit-test
>
> Running SparkSQL local embedded unit tests on Win10, using winutils.
> Got warnings about 'hadoop chgrp'.
> See environment info.
> {code:java}
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}
> Related info on SO: 
> https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-30701:
--
Description: 
Running SparkSQL local embedded unit tests on Win10, using winutils.

Got warnings about 'hadoop chgrp'.

See environment info.
{code:java}
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}

  was:
Running SparkSQL embedded unit tests on Win10, using winutils.

Got warnings about 'hadoop chgrp'.

See environment info.
{code:java}
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}


> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Windows 10
> Winutils 2.7.1: 
> [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]
> Oracle JavaSE 8
> SparkSQL 2.4.4
> Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive
> Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: WIndows, hive, unit-test
>
> Running SparkSQL local embedded unit tests on Win10, using winutils.
> Got warnings about 'hadoop chgrp'.
> See environment info.
> {code:java}
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-30701:
--
Environment: 
Windows 10

Winutils 2.7.1: 
[https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]

Oracle JavaSE 8

SparkSQL 2.4.4

Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive

Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive

  was:
Windows 10

Winutils 2.7.1: 
[https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]

Oracle JavaSE 8

SparkSQL 2.4.4

Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive


> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Windows 10
> Winutils 2.7.1: 
> [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]
> Oracle JavaSE 8
> SparkSQL 2.4.4
> Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive
> Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: WIndows, hive, unit-test
>
> Running SparkSQL unit tests on Win10, using winutils.
> Got warnings about 'hadoop chgrp'.
> See environment info.
> {code:java}
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-30701:
--
Description: 
Running SparkSQL embedded unit tests on Win10, using winutils.

Got warnings about 'hadoop chgrp'.

See environment info.
{code:java}
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}

  was:
Running SparkSQL unit tests on Win10, using winutils.

Got warnings about 'hadoop chgrp'.

See environment info.
{code:java}
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}


> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Windows 10
> Winutils 2.7.1: 
> [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]
> Oracle JavaSE 8
> SparkSQL 2.4.4
> Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive
> Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: WIndows, hive, unit-test
>
> Running SparkSQL embedded unit tests on Win10, using winutils.
> Got warnings about 'hadoop chgrp'.
> See environment info.
> {code:java}
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-30701:
--
Description: 
Running SparkSQL unit tests on Win10, using winutils.

Got warnings about 'hadoop chgrp'.

See environment info.
{code:java}
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}

  was:
{code:java}
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}


> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Windows 10
> Winutils 2.7.1: 
> [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]
> Oracle JavaSE 8
> SparkSQL 2.4.4
> Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: WIndows, hive, unit-test
>
> Running SparkSQL unit tests on Win10, using winutils.
> Got warnings about 'hadoop chgrp'.
> See environment info.
> {code:java}
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-30701:
--
Description: 
{code:java}
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'TEST\Domain users' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}

  was:
{code}

-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
.-chgrp: 'APPVYR-WIN\None' does 
not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
.-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}

Environment: 
Windows 10

Winutils 2.7.1: 
[https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]

Oracle JavaSE 8

SparkSQL 2.4.4

Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive

> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Windows 10
> Winutils 2.7.1: 
> [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1]
> Oracle JavaSE 8
> SparkSQL 2.4.4
> Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: WIndows, hive, unit-test
>
> {code:java}
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'TEST\Domain users' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-30701:
--
Affects Version/s: (was: 2.3.0)
   2.4.4

> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: WIndows, hive, unit-test
>
> {code}
> -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> .-chgrp: 'APPVYR-WIN\None' 
> does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> .-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-30701:
--
Labels: WIndows hive unit-test  (was: bulk-closed)

> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: WIndows, hive, unit-test
>
> {code}
> -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> .-chgrp: 'APPVYR-WIN\None' 
> does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> .-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guram Savinov updated SPARK-30701:
--
Component/s: (was: SparkR)

> SQL test running on Windows: hadoop chgrp warnings
> --
>
> Key: SPARK-30701
> URL: https://issues.apache.org/jira/browse/SPARK-30701
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Guram Savinov
>Assignee: Felix Cheung
>Priority: Major
>  Labels: bulk-closed
>
> {code}
> -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> .-chgrp: 'APPVYR-WIN\None' 
> does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> .-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
> Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings

2020-02-01 Thread Guram Savinov (Jira)
Guram Savinov created SPARK-30701:
-

 Summary: SQL test running on Windows: hadoop chgrp warnings
 Key: SPARK-30701
 URL: https://issues.apache.org/jira/browse/SPARK-30701
 Project: Spark
  Issue Type: Bug
  Components: SparkR, SQL
Affects Versions: 2.3.0
Reporter: Guram Savinov
Assignee: Felix Cheung


{code}

-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
.-chgrp: 'APPVYR-WIN\None' does 
not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
.-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2016-03-14 Thread Guram Savinov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192905#comment-15192905
 ] 

Guram Savinov commented on SPARK-12216:
---

http://spark.apache.org/docs/latest/#launching-on-a-cluster

>> Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). It’s 
>> easy to run locally on one machine — all you need is to have java installed 
>> on your system PATH, or the JAVA_HOME environment variable pointing to a 
>> Java installation.

Sad that it isn't true.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2016-03-14 Thread Guram Savinov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192877#comment-15192877
 ] 

Guram Savinov commented on SPARK-12216:
---

Why you closed this issue?
You don't care about Spark on Windows?

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2016-03-13 Thread Guram Savinov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192281#comment-15192281
 ] 

Guram Savinov edited comment on SPARK-12216 at 3/13/16 10:49 AM:
-

I have the same problem when exit from spark-shell on windows 7.
Seems that it's not the permission problems because I start console as admin 
and have no problems with removng this directories manually.
Maybe this directory is locked by some thread when shutdown hook executes.

Take a look at this post, it has details about possible directory lock:
http://jakzaprogramowac.pl/pytanie/12478,how-to-find-which-java-scala-thread-has-locked-a-file


was (Author: gsavinov):
I have the same problem when exit from spark-shell on windows 7.
Seems that it's not the permission problems because I start console as admin 
and have no problems with removng this directories manually.
Maybe this directory is locked by some thread when shutdown hook executes.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2016-03-13 Thread Guram Savinov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192281#comment-15192281
 ] 

Guram Savinov commented on SPARK-12216:
---

I have the same problem when exit from spark-shell on windows 7.
Seems that it's not the permission problems because I start console as admin 
and have no problems with removng this directories manually.
Maybe this directory is locked by some thread when shutdown hook executes.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org