[jira] [Commented] (SPARK-20780) Spark Kafka10 Consumer Hangs

2018-12-06 Thread shangmin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712449#comment-16712449
 ] 

shangmin commented on SPARK-20780:
--

The root case is the kafka consumer lib version used in your program. Check lib 
dependency use maven or other tools,

Ensure that use a consumer lib version greater than 0.10.0(I use 0.10.2.x works 
fine).

If you use maven, exclude kafka lib, and add kafka lib with version 0.10.2.x.

1: 


 org.apache.spark
 spark-sql-kafka-0-10_2.11
 2.2.0

 
 
 org.apache.kafka
 kafka_2.11
 
 
 org.apache.kafka
 kafka-clients
 
 


2:


 org.apache.kafka
 kafka_2.11
 0.10.2.1



 org.apache.kafka
 kafka-clients
 0.10.2.1


> Spark Kafka10 Consumer Hangs
> 
>
> Key: SPARK-20780
> URL: https://issues.apache.org/jira/browse/SPARK-20780
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.1.0
> Environment: Spark 2.1.0
> Spark Streaming Kafka 010
> Yarn - Cluster Mode
> CDH 5.8.4
> CentOS Linux release 7.2
>Reporter: jayadeepj
>Priority: Major
> Attachments: streaming_1.png, streaming_2.png, tasks_timing_out_3.png
>
>
> We have recently upgraded our Streaming App with Direct Stream to Spark 2 
> (spark-streaming-kafka-0-10 - 2.1.0) with Kafka version (0.10.0.0) & Consumer 
> 10 . We find abnormal delays after the application has run for a couple of 
> hours or completed consumption of approx. ~ 5 million records.
> See screenshot 1 & 2
> There is a sudden dip in the processing time from ~15 seconds (usual for this 
> app) to ~3 minutes & from then on the processing time keeps degrading 
> throughout.
> We have seen that the delay is due to certain tasks taking the exact time 
> duration of the configured Kafka Consumer 'request.timeout.ms' . We have 
> tested this by varying timeout property to different values.
> See screenshot 3.
> I think the get(offset: Long, timeout: Long): ConsumerRecord[K, V] method  & 
> subsequent poll(timeout) method in CachedKafkaConsumer.scala is actually 
> timing out on some of the partitions without reading data. But the executor 
> logs it as successfully completed after the exact timeout duration. Note that 
> most other tasks are completing successfully with millisecond duration. The 
> timeout is most likely from the 
> org.apache.kafka.clients.consumer.KafkaConsumer & we did not observe any 
> network latency difference.
> We have observed this across multiple clusters & multiple apps with & without 
> TLS/SSL. Spark 1.6 with 0-8 consumer seems to be fine with consistent 
> performance
> 17/05/17 10:30:06 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 446288
> 17/05/17 10:30:06 INFO executor.Executor: Running task 11.0 in stage 5663.0 
> (TID 446288)
> 17/05/17 10:30:06 INFO kafka010.KafkaRDD: Computing topic XX-XXX-XX, 
> partition 0 offsets 776843 -> 779591
> 17/05/17 10:30:06 INFO kafka010.CachedKafkaConsumer: Initial fetch for 
> spark-executor-default1 XX-XXX-XX 0 776843
> 17/05/17 10:30:56 INFO executor.Executor: Finished task 11.0 in stage 5663.0 
> (TID 446288). 1699 bytes result sent to driver
> 17/05/17 10:30:56 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 446329
> 17/05/17 10:30:56 INFO executor.Executor: Running task 0.0 in stage 5667.0 
> (TID 446329)
> 17/05/17 10:30:56 INFO spark.MapOutputTrackerWorker: Updating epoch to 3116 
> and clearing cache
> 17/05/17 10:30:56 INFO broadcast.TorrentBroadcast: Started reading broadcast 
> variable 6807
> 17/05/17 10:30:56 INFO memory.MemoryStore: Block broadcast_6807_piece0 stored 
> as bytes in memory (estimated size 13.1 KB, free 4.1 GB)
> 17/05/17 10:30:56 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 6807 took 4 ms
> 17/05/17 10:30:56 INFO memory.MemoryStore: Block broadcast_6807 stored as 
> values in m
> We can see that the log statement differ with the exact timeout duration.
> Our consumer config is below.
> 17/05/17 12:33:13 INFO dstream.ForEachDStream: Initialized and validated 
> org.apache.spark.streaming.dstream.ForEachDStream@1171dde4
> 17/05/17 12:33:13 INFO consumer.ConsumerConfig: ConsumerConfig values:
>   metric.reporters = []
>   metadata.max.age.ms = 30
>   partition.assignment.strategy = 
> [org.apache.kafka.clients.consumer.RangeAssignor]
>   reconnect.backoff.ms = 50
>   sasl.kerberos.ticket.renew.window.factor = 0.8
>   max.partition.fetch.bytes = 1048576
>   bootstrap.servers = [x.xxx.xxx:9092]
>   ssl.keystore.type = JKS
>   enable.auto.commit = true
>   sasl.mechanism = GSSAPI
>   interceptor.classes = null
>   exclude.internal.topics = true
>   ssl.truststore.password = null
>   client.id =
>   ssl.endpoint.identification.algorithm = null
>   max.poll.records = 2147483647
>   check.crcs = 

[jira] [Updated] (SPARK-23580) Interpreted mode fallback should be implemented for all expressions & projections

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-23580:
--
Target Version/s:   (was: 2.5.0)

> Interpreted mode fallback should be implemented for all expressions & 
> projections
> -
>
> Key: SPARK-23580
> URL: https://issues.apache.org/jira/browse/SPARK-23580
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Major
>  Labels: release-notes
>
> Spark SQL currently does not support interpreted mode for all expressions and 
> projections. This is a problem for scenario's where were code generation does 
> not work, or blows past the JVM class limits. We currently cannot gracefully 
> fallback.
> This ticket is an umbrella to fix this class of problem in Spark SQL. This 
> work can be divided into two main area's:
> - Add interpreted versions for all dataset related expressions.
> - Add an interpreted version of {{GenerateUnsafeProjection}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24360) Support Hive 3.1 metastore

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24360:
--
Target Version/s:   (was: 2.5.0)

> Support Hive 3.1 metastore
> --
>
> Key: SPARK-24360
> URL: https://issues.apache.org/jira/browse/SPARK-24360
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Hive 3.1.0 is released. This issue aims to support Hive Metastore 3.1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12978) Skip unnecessary final group-by when input data already clustered with group-by keys

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-12978:
--
Target Version/s:   (was: 2.5.0)

> Skip unnecessary final group-by when input data already clustered with 
> group-by keys
> 
>
> Key: SPARK-12978
> URL: https://issues.apache.org/jira/browse/SPARK-12978
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
>
> This ticket targets the optimization to skip an unnecessary group-by 
> operation below;
> Without opt.:
> {code}
> == Physical Plan ==
> TungstenAggregate(key=[col0#159], 
> functions=[(sum(col1#160),mode=Final,isDistinct=false),(avg(col2#161),mode=Final,isDistinct=false)],
>  output=[col0#159,sum(col1)#177,avg(col2)#178])
> +- TungstenAggregate(key=[col0#159], 
> functions=[(sum(col1#160),mode=Partial,isDistinct=false),(avg(col2#161),mode=Partial,isDistinct=false)],
>  output=[col0#159,sum#200,sum#201,count#202L])
>+- TungstenExchange hashpartitioning(col0#159,200), None
>   +- InMemoryColumnarTableScan [col0#159,col1#160,col2#161], 
> InMemoryRelation [col0#159,col1#160,col2#161], true, 1, 
> StorageLevel(true, true, false, true, 1), ConvertToUnsafe, None
> {code}
> With opt.:
> {code}
> == Physical Plan ==
> TungstenAggregate(key=[col0#159], 
> functions=[(sum(col1#160),mode=Complete,isDistinct=false),(avg(col2#161),mode=Final,isDistinct=false)],
>  output=[col0#159,sum(col1)#177,avg(col2)#178])
> +- TungstenExchange hashpartitioning(col0#159,200), None
>   +- InMemoryColumnarTableScan [col0#159,col1#160,col2#161], InMemoryRelation 
> [col0#159,col1#160,col2#161], true, 1, StorageLevel(true, true, false, 
> true, 1), ConvertToUnsafe, None
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15117) Generate code that get a value in each compressed column from CachedBatch when DataFrame.cache() is called

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-15117:
--
Target Version/s:   (was: 2.5.0)

> Generate code that get a value in each compressed column from CachedBatch 
> when DataFrame.cache() is called
> --
>
> Key: SPARK-15117
> URL: https://issues.apache.org/jira/browse/SPARK-15117
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> Once SPARK-14098 is merged, we will migrate a feature in this JIRA entry.
> When DataFrame.cache() is called, data is stored as column-oriented storage 
> in CachedBatch. The current Catalyst generates Java program to get a value of 
> a column from an InternalRow that is translated from CachedBatch. This issue 
> generates Java code to get a value of a column from CachedBatch. This JIRA 
> entry supports other primitive types (boolean/byte/short/int/long) whose 
> column may be compressed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20184) performance regression for complex/long sql when enable whole stage codegen

2018-12-06 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712414#comment-16712414
 ] 

Dongjoon Hyun commented on SPARK-20184:
---

If this exists up to 2.4.0, could you update the `Affects Versions`, [~kiszk]?

> performance regression for complex/long sql when enable whole stage codegen
> ---
>
> Key: SPARK-20184
> URL: https://issues.apache.org/jira/browse/SPARK-20184
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0, 2.1.0
>Reporter: Fei Wang
>Priority: Major
>
> The performance of following SQL get much worse in spark 2.x  in contrast 
> with codegen off.
> SELECT
>sum(COUNTER_57) 
> ,sum(COUNTER_71) 
> ,sum(COUNTER_3)  
> ,sum(COUNTER_70) 
> ,sum(COUNTER_66) 
> ,sum(COUNTER_75) 
> ,sum(COUNTER_69) 
> ,sum(COUNTER_55) 
> ,sum(COUNTER_63) 
> ,sum(COUNTER_68) 
> ,sum(COUNTER_56) 
> ,sum(COUNTER_37) 
> ,sum(COUNTER_51) 
> ,sum(COUNTER_42) 
> ,sum(COUNTER_43) 
> ,sum(COUNTER_1)  
> ,sum(COUNTER_76) 
> ,sum(COUNTER_54) 
> ,sum(COUNTER_44) 
> ,sum(COUNTER_46) 
> ,DIM_1 
> ,DIM_2 
>   ,DIM_3
> FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;
> Num of rows of aggtable is about 3500.
> whole stage codegen on(spark.sql.codegen.wholeStage = true):40s
> whole stage codegen  off(spark.sql.codegen.wholeStage = false):6s
> After some analysis i think this is related to the huge java method(a java 
> method of thousand lines) which generated by codegen.
> And If i config -XX:-DontCompileHugeMethods the performance get much 
> better(about 7s).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16196) Optimize in-memory scan performance using ColumnarBatches

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-16196:
--
Target Version/s:   (was: 2.5.0)

> Optimize in-memory scan performance using ColumnarBatches
> -
>
> Key: SPARK-16196
> URL: https://issues.apache.org/jira/browse/SPARK-16196
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Major
>
> A simple benchmark such as the following reveals inefficiencies in the 
> existing in-memory scan implementation:
> {code}
> spark.range(N)
>   .selectExpr("id", "floor(rand() * 1) as k")
>   .createOrReplaceTempView("test")
> val ds = spark.sql("select count(k), count(id) from test").cache()
> ds.collect()
> ds.collect()
> {code}
> There are many reasons why caching is slow. The biggest is that compression 
> takes a long time. The second is that there are a lot of virtual function 
> calls in this hot code path since the rows are processed using iterators. 
> Further, the rows are converted to and from ByteBuffers, which are slow to 
> read in general.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20184) performance regression for complex/long sql when enable whole stage codegen

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-20184:
--
Target Version/s:   (was: 2.5.0)

> performance regression for complex/long sql when enable whole stage codegen
> ---
>
> Key: SPARK-20184
> URL: https://issues.apache.org/jira/browse/SPARK-20184
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0, 2.1.0
>Reporter: Fei Wang
>Priority: Major
>
> The performance of following SQL get much worse in spark 2.x  in contrast 
> with codegen off.
> SELECT
>sum(COUNTER_57) 
> ,sum(COUNTER_71) 
> ,sum(COUNTER_3)  
> ,sum(COUNTER_70) 
> ,sum(COUNTER_66) 
> ,sum(COUNTER_75) 
> ,sum(COUNTER_69) 
> ,sum(COUNTER_55) 
> ,sum(COUNTER_63) 
> ,sum(COUNTER_68) 
> ,sum(COUNTER_56) 
> ,sum(COUNTER_37) 
> ,sum(COUNTER_51) 
> ,sum(COUNTER_42) 
> ,sum(COUNTER_43) 
> ,sum(COUNTER_1)  
> ,sum(COUNTER_76) 
> ,sum(COUNTER_54) 
> ,sum(COUNTER_44) 
> ,sum(COUNTER_46) 
> ,DIM_1 
> ,DIM_2 
>   ,DIM_3
> FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;
> Num of rows of aggtable is about 3500.
> whole stage codegen on(spark.sql.codegen.wholeStage = true):40s
> whole stage codegen  off(spark.sql.codegen.wholeStage = false):6s
> After some analysis i think this is related to the huge java method(a java 
> method of thousand lines) which generated by codegen.
> And If i config -XX:-DontCompileHugeMethods the performance get much 
> better(about 7s).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21318) The exception message thrown by `lookupFunction` is ambiguous.

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-21318:
--
Target Version/s:   (was: 2.5.0)

> The exception message thrown by `lookupFunction` is ambiguous.
> --
>
> Key: SPARK-21318
> URL: https://issues.apache.org/jira/browse/SPARK-21318
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1
>Reporter: StanZhai
>Assignee: StanZhai
>Priority: Minor
> Fix For: 2.4.0
>
>
> The function actually exists in current selected database, but the exception 
> message is: 
> {code}
> This function is neither a registered temporary function nor a permanent 
> function registered in the database 'default'.
> {code}
> My UDF has already been registered in the current database. But it's failed 
> to init during lookupFunction. 
> The exception message should be:
> {code}
> No handler for Hive UDF 'site.stanzhai.UDAFXXX': 
> org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException: Two arguments is 
> expected
> {code}
> This is not conducive to positioning problems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23580) Interpreted mode fallback should be implemented for all expressions & projections

2018-12-06 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712410#comment-16712410
 ] 

Dongjoon Hyun commented on SPARK-23580:
---

I removed the target version `2.5.0`. Please feel free to add `3.0.0` if needed.

> Interpreted mode fallback should be implemented for all expressions & 
> projections
> -
>
> Key: SPARK-23580
> URL: https://issues.apache.org/jira/browse/SPARK-23580
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Major
>  Labels: release-notes
>
> Spark SQL currently does not support interpreted mode for all expressions and 
> projections. This is a problem for scenario's where were code generation does 
> not work, or blows past the JVM class limits. We currently cannot gracefully 
> fallback.
> This ticket is an umbrella to fix this class of problem in Spark SQL. This 
> work can be divided into two main area's:
> - Add interpreted versions for all dataset related expressions.
> - Add an interpreted version of {{GenerateUnsafeProjection}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24360) Support Hive 3.1 metastore

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24360:
--
Target Version/s: 3.0.0

> Support Hive 3.1 metastore
> --
>
> Key: SPARK-24360
> URL: https://issues.apache.org/jira/browse/SPARK-24360
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Hive 3.1.0 is released. This issue aims to support Hive Metastore 3.1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26301) Consider switching from putting secret in environment variable directly to using secret reference

2018-12-06 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-26301:
--

 Summary: Consider switching from putting secret in environment 
variable directly to using secret reference
 Key: SPARK-26301
 URL: https://issues.apache.org/jira/browse/SPARK-26301
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Matt Cheah


In SPARK-26194 we proposed using an environment variable that is loaded in the 
executor pod spec to share the generated SASL secret key between the driver and 
the executors. However in practice this is very difficult to secure. Most 
traditional Kubernetes deployments will handle permissions by allowing wide 
access to viewing pod specs but restricting access to view Kubernetes secrets. 
Now however any user that can view the pod spec can also view the contents of 
the SASL secrets.

An example use case where this quickly breaks down is in the case where a 
systems administrator is allowed to look at pods that run user code in order to 
debug failing infrastructure, but the cluster administrator should not be able 
to view contents of secrets or other sensitive data from Spark applications run 
by their users.

We propose modifying the existing solution to instead automatically create a 
Kubernetes Secret object containing the SASL encryption key, then using the 
[secret reference feature in 
Kubernetes|https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables]
 to store the data in the environment variable without putting the secret data 
in the pod spec itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26239) Add configurable auth secret source in k8s backend

2018-12-06 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712373#comment-16712373
 ] 

Apache Spark commented on SPARK-26239:
--

User 'mccheah' has created a pull request for this issue:
https://github.com/apache/spark/pull/23252

> Add configurable auth secret source in k8s backend
> --
>
> Key: SPARK-26239
> URL: https://issues.apache.org/jira/browse/SPARK-26239
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> This is a follow up to SPARK-26194, which aims to add auto-generated secrets 
> similar to the YARN backend.
> There's a desire to support different ways to generate and propagate these 
> auth secrets (e.g. using things like Vault). Need to investigate:
> - exposing configuration to support that
> - changing SecurityManager so that it can delegate some of the 
> secret-handling logic to custom implementations
> - figuring out whether this can also be used in client-mode, where the driver 
> is not created by the k8s backend in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26239) Add configurable auth secret source in k8s backend

2018-12-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26239:


Assignee: Apache Spark

> Add configurable auth secret source in k8s backend
> --
>
> Key: SPARK-26239
> URL: https://issues.apache.org/jira/browse/SPARK-26239
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Apache Spark
>Priority: Major
>
> This is a follow up to SPARK-26194, which aims to add auto-generated secrets 
> similar to the YARN backend.
> There's a desire to support different ways to generate and propagate these 
> auth secrets (e.g. using things like Vault). Need to investigate:
> - exposing configuration to support that
> - changing SecurityManager so that it can delegate some of the 
> secret-handling logic to custom implementations
> - figuring out whether this can also be used in client-mode, where the driver 
> is not created by the k8s backend in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26239) Add configurable auth secret source in k8s backend

2018-12-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26239:


Assignee: (was: Apache Spark)

> Add configurable auth secret source in k8s backend
> --
>
> Key: SPARK-26239
> URL: https://issues.apache.org/jira/browse/SPARK-26239
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> This is a follow up to SPARK-26194, which aims to add auto-generated secrets 
> similar to the YARN backend.
> There's a desire to support different ways to generate and propagate these 
> auth secrets (e.g. using things like Vault). Need to investigate:
> - exposing configuration to support that
> - changing SecurityManager so that it can delegate some of the 
> secret-handling logic to custom implementations
> - figuring out whether this can also be used in client-mode, where the driver 
> is not created by the k8s backend in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26188) Spark 2.4.0 Partitioning behavior breaks backwards compatibility

2018-12-06 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-26188:

Labels: release-notes  (was: )

> Spark 2.4.0 Partitioning behavior breaks backwards compatibility
> 
>
> Key: SPARK-26188
> URL: https://issues.apache.org/jira/browse/SPARK-26188
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Damien Doucet-Girard
>Assignee: Gengliang Wang
>Priority: Critical
>  Labels: correctness, release-notes
> Fix For: 2.4.1, 3.0.0
>
>
> My team uses spark to partition and output parquet files to amazon S3. We 
> typically use 256 partitions, from 00 to ff.
> We've observed that in spark 2.3.2 and prior, it reads the partitions as 
> strings by default. However, in spark 2.4.0 and later, the type of each 
> partition is inferred by default, and partitions such as 00 become 0 and 4d 
> become 4.0.
>  Here is a log sample of this behavior from one of our jobs:
>  2.4.0:
> {code:java}
> 18/11/27 14:02:27 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=00/part-00061-hashredacted.parquet, 
> range: 0-662, partition values: [0]
> 18/11/27 14:02:28 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=ef/part-00034-hashredacted.parquet, 
> range: 0-662, partition values: [ef]
> 18/11/27 14:02:29 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=4a/part-00151-hashredacted.parquet, 
> range: 0-662, partition values: [4a]
> 18/11/27 14:02:30 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=74/part-00180-hashredacted.parquet, 
> range: 0-662, partition values: [74]
> 18/11/27 14:02:32 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=f5/part-00156-hashredacted.parquet, 
> range: 0-662, partition values: [f5]
> 18/11/27 14:02:33 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=50/part-00195-hashredacted.parquet, 
> range: 0-662, partition values: [50]
> 18/11/27 14:02:34 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=70/part-00054-hashredacted.parquet, 
> range: 0-662, partition values: [70]
> 18/11/27 14:02:35 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=b9/part-00012-hashredacted.parquet, 
> range: 0-662, partition values: [b9]
> 18/11/27 14:02:37 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=d2/part-00016-hashredacted.parquet, 
> range: 0-662, partition values: [d2]
> 18/11/27 14:02:38 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=51/part-3-hashredacted.parquet, 
> range: 0-662, partition values: [51]
> 18/11/27 14:02:39 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=84/part-00135-hashredacted.parquet, 
> range: 0-662, partition values: [84]
> 18/11/27 14:02:40 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=b5/part-00190-hashredacted.parquet, 
> range: 0-662, partition values: [b5]
> 18/11/27 14:02:41 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=88/part-00143-hashredacted.parquet, 
> range: 0-662, partition values: [88]
> 18/11/27 14:02:42 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=4d/part-00120-hashredacted.parquet, 
> range: 0-662, partition values: [4.0]
> 18/11/27 14:02:43 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=ac/part-00119-hashredacted.parquet, 
> range: 0-662, partition values: [ac]
> 18/11/27 14:02:44 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=24/part-00139-hashredacted.parquet, 
> range: 0-662, partition values: [24]
> 18/11/27 14:02:45 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=fd/part-00167-hashredacted.parquet, 
> range: 0-662, partition values: [fd]
> 18/11/27 14:02:46 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=52/part-00033-hashredacted.parquet, 
> range: 0-662, partition values: [52]
> 18/11/27 14:02:47 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=ab/part-00083-hashredacted.parquet, 
> range: 0-662, partition values: [ab]
> 18/11/27 14:02:48 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=f8/part-00018-hashredacted.parquet, 
> range: 0-662, partition values: [f8]
> 18/11/27 14:02:49 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=7a/part-00093-hashredacted.parquet, 
> range: 0-662, partition values: [7a]
> 18/11/27 14:02:50 

[jira] [Updated] (SPARK-26188) Spark 2.4.0 Partitioning behavior breaks backwards compatibility

2018-12-06 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-26188:

Labels: correctness release-notes  (was: release-notes)

> Spark 2.4.0 Partitioning behavior breaks backwards compatibility
> 
>
> Key: SPARK-26188
> URL: https://issues.apache.org/jira/browse/SPARK-26188
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Damien Doucet-Girard
>Assignee: Gengliang Wang
>Priority: Critical
>  Labels: correctness, release-notes
> Fix For: 2.4.1, 3.0.0
>
>
> My team uses spark to partition and output parquet files to amazon S3. We 
> typically use 256 partitions, from 00 to ff.
> We've observed that in spark 2.3.2 and prior, it reads the partitions as 
> strings by default. However, in spark 2.4.0 and later, the type of each 
> partition is inferred by default, and partitions such as 00 become 0 and 4d 
> become 4.0.
>  Here is a log sample of this behavior from one of our jobs:
>  2.4.0:
> {code:java}
> 18/11/27 14:02:27 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=00/part-00061-hashredacted.parquet, 
> range: 0-662, partition values: [0]
> 18/11/27 14:02:28 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=ef/part-00034-hashredacted.parquet, 
> range: 0-662, partition values: [ef]
> 18/11/27 14:02:29 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=4a/part-00151-hashredacted.parquet, 
> range: 0-662, partition values: [4a]
> 18/11/27 14:02:30 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=74/part-00180-hashredacted.parquet, 
> range: 0-662, partition values: [74]
> 18/11/27 14:02:32 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=f5/part-00156-hashredacted.parquet, 
> range: 0-662, partition values: [f5]
> 18/11/27 14:02:33 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=50/part-00195-hashredacted.parquet, 
> range: 0-662, partition values: [50]
> 18/11/27 14:02:34 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=70/part-00054-hashredacted.parquet, 
> range: 0-662, partition values: [70]
> 18/11/27 14:02:35 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=b9/part-00012-hashredacted.parquet, 
> range: 0-662, partition values: [b9]
> 18/11/27 14:02:37 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=d2/part-00016-hashredacted.parquet, 
> range: 0-662, partition values: [d2]
> 18/11/27 14:02:38 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=51/part-3-hashredacted.parquet, 
> range: 0-662, partition values: [51]
> 18/11/27 14:02:39 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=84/part-00135-hashredacted.parquet, 
> range: 0-662, partition values: [84]
> 18/11/27 14:02:40 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=b5/part-00190-hashredacted.parquet, 
> range: 0-662, partition values: [b5]
> 18/11/27 14:02:41 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=88/part-00143-hashredacted.parquet, 
> range: 0-662, partition values: [88]
> 18/11/27 14:02:42 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=4d/part-00120-hashredacted.parquet, 
> range: 0-662, partition values: [4.0]
> 18/11/27 14:02:43 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=ac/part-00119-hashredacted.parquet, 
> range: 0-662, partition values: [ac]
> 18/11/27 14:02:44 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=24/part-00139-hashredacted.parquet, 
> range: 0-662, partition values: [24]
> 18/11/27 14:02:45 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=fd/part-00167-hashredacted.parquet, 
> range: 0-662, partition values: [fd]
> 18/11/27 14:02:46 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=52/part-00033-hashredacted.parquet, 
> range: 0-662, partition values: [52]
> 18/11/27 14:02:47 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=ab/part-00083-hashredacted.parquet, 
> range: 0-662, partition values: [ab]
> 18/11/27 14:02:48 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=f8/part-00018-hashredacted.parquet, 
> range: 0-662, partition values: [f8]
> 18/11/27 14:02:49 INFO FileScanRDD: Reading File path: 
> s3a://bucketnamereadacted/ddgirard/suffix=7a/part-00093-hashredacted.parquet, 
> range: 0-662, partition values: 

[jira] [Commented] (SPARK-25401) Reorder the required ordering to match the table's output ordering for bucket join

2018-12-06 Thread David Vrba (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712350#comment-16712350
 ] 

David Vrba commented on SPARK-25401:


I was looking at it and i believe that it the class EnsureRequirements we could 
reorder the join predicates for SortMergeJoin once more - just before we check 
if child outputOrdering satisfies the requiredOrdering - and we can align the 
predicate keys with the child outputOrdering. In such case it is not going to 
add the unnecessary SortExec and also it is not going to add unnecessary 
Exchange either, because Exchange is handled before.

 

What do you guys think? Is it a good approach? (Please be patient with me, this 
is my first Jira on Spark)

> Reorder the required ordering to match the table's output ordering for bucket 
> join
> --
>
> Key: SPARK-25401
> URL: https://issues.apache.org/jira/browse/SPARK-25401
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Wang, Gang
>Priority: Major
>
> Currently, we check if SortExec is needed between a operator and its child 
> operator in method orderingSatisfies, and method orderingSatisfies require 
> the order in the SortOrders are all the same.
> While, take the following case into consideration.
>  * Table a is bucketed by (a1, a2), sorted by (a2, a1), and buckets number is 
> 200.
>  * Table b is bucketed by (b1, b2), sorted by (b2, b1), and buckets number is 
> 200.
>  * Table a join table b on (a1=b1, a2=b2)
> In this case, if the join is sort merge join, the query planner won't add 
> exchange on both sides, while, sort will be added on both sides. Actually, 
> sort is also unnecessary, since in the same bucket, like bucket 1 of table a, 
> and bucket 1 of table b, (a1=b1, a2=b2) is equivalent to (a2=b2, a1=b1).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26298) Upgrade Janino version to 3.0.11

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26298:
--
Issue Type: Improvement  (was: Bug)

> Upgrade Janino version to 3.0.11
> 
>
> Key: SPARK-26298
> URL: https://issues.apache.org/jira/browse/SPARK-26298
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> This issue aims to upgrade Janino compiler to [version 
> 3.0.11|http://janino-compiler.github.io/janino/changelog.html].
> - Fixed issue #63: Script with many "helper" variables.
> - At last, the "jdk-9" implementation works.
> - Made the code that is required for JDK 9+ (ModuleFinder and consorts) 
> Java-6-compilable through reflection, so that commons-compiler-jdk can still 
> be compiled with Java 6, but functions at runtime also with Java 9+.
> - Fixed Java 9+ compatibility (JRE module system).
> - Fixed issue #65: Compilation Error Messages Generated by JDK.
> - Added experimental support for the "StackMapFrame" attribute; not active 
> yet.
> - Fixed issue #61: Make Unparser more flexible.
> - Fixed NPEs in various "toString()" methods.
> - Fixed issue #62: Optimize static method invocation with rvalue target 
> expression.
> - Merged pull request #60: Added all missing "ClassFile.getConstant*Info()" 
> methods, removing the necessity for many type casts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26298) Upgrade Janino version to 3.0.11

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-26298.
---
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/23250 .

> Upgrade Janino version to 3.0.11
> 
>
> Key: SPARK-26298
> URL: https://issues.apache.org/jira/browse/SPARK-26298
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue aims to upgrade Janino compiler to [version 
> 3.0.11|http://janino-compiler.github.io/janino/changelog.html].
> - Fixed issue #63: Script with many "helper" variables.
> - At last, the "jdk-9" implementation works.
> - Made the code that is required for JDK 9+ (ModuleFinder and consorts) 
> Java-6-compilable through reflection, so that commons-compiler-jdk can still 
> be compiled with Java 6, but functions at runtime also with Java 9+.
> - Fixed Java 9+ compatibility (JRE module system).
> - Fixed issue #65: Compilation Error Messages Generated by JDK.
> - Added experimental support for the "StackMapFrame" attribute; not active 
> yet.
> - Fixed issue #61: Make Unparser more flexible.
> - Fixed NPEs in various "toString()" methods.
> - Fixed issue #62: Optimize static method invocation with rvalue target 
> expression.
> - Merged pull request #60: Added all missing "ClassFile.getConstant*Info()" 
> methods, removing the necessity for many type casts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-26298) Upgrade Janino version to 3.0.11

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26298:
--
Comment: was deleted

(was: User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/23250)

> Upgrade Janino version to 3.0.11
> 
>
> Key: SPARK-26298
> URL: https://issues.apache.org/jira/browse/SPARK-26298
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue aims to upgrade Janino compiler to [version 
> 3.0.11|http://janino-compiler.github.io/janino/changelog.html].
> - Fixed issue #63: Script with many "helper" variables.
> - At last, the "jdk-9" implementation works.
> - Made the code that is required for JDK 9+ (ModuleFinder and consorts) 
> Java-6-compilable through reflection, so that commons-compiler-jdk can still 
> be compiled with Java 6, but functions at runtime also with Java 9+.
> - Fixed Java 9+ compatibility (JRE module system).
> - Fixed issue #65: Compilation Error Messages Generated by JDK.
> - Added experimental support for the "StackMapFrame" attribute; not active 
> yet.
> - Fixed issue #61: Make Unparser more flexible.
> - Fixed NPEs in various "toString()" methods.
> - Fixed issue #62: Optimize static method invocation with rvalue target 
> expression.
> - Merged pull request #60: Added all missing "ClassFile.getConstant*Info()" 
> methods, removing the necessity for many type casts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26300) The `checkForStreaming` mothod may be called twice in `createQuery`

2018-12-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26300:


Assignee: (was: Apache Spark)

> The `checkForStreaming`  mothod  may be called twice in `createQuery`
> -
>
> Key: SPARK-26300
> URL: https://issues.apache.org/jira/browse/SPARK-26300
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: liuxian
>Priority: Minor
>
> If {{checkForContinuous}} is called ( {{checkForStreaming}} is called in 
> {{checkForContinuous}} ), the {{checkForStreaming}} mothod will be called 
> twice in {{createQuery}} , this is not necessary, and the 
> {{checkForStreaming}} method has a lot of statements, so it's better to 
> remove one of them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26300) The `checkForStreaming` mothod may be called twice in `createQuery`

2018-12-06 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712296#comment-16712296
 ] 

Apache Spark commented on SPARK-26300:
--

User '10110346' has created a pull request for this issue:
https://github.com/apache/spark/pull/23251

> The `checkForStreaming`  mothod  may be called twice in `createQuery`
> -
>
> Key: SPARK-26300
> URL: https://issues.apache.org/jira/browse/SPARK-26300
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: liuxian
>Priority: Minor
>
> If {{checkForContinuous}} is called ( {{checkForStreaming}} is called in 
> {{checkForContinuous}} ), the {{checkForStreaming}} mothod will be called 
> twice in {{createQuery}} , this is not necessary, and the 
> {{checkForStreaming}} method has a lot of statements, so it's better to 
> remove one of them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26300) The `checkForStreaming` mothod may be called twice in `createQuery`

2018-12-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26300:


Assignee: Apache Spark

> The `checkForStreaming`  mothod  may be called twice in `createQuery`
> -
>
> Key: SPARK-26300
> URL: https://issues.apache.org/jira/browse/SPARK-26300
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: liuxian
>Assignee: Apache Spark
>Priority: Minor
>
> If {{checkForContinuous}} is called ( {{checkForStreaming}} is called in 
> {{checkForContinuous}} ), the {{checkForStreaming}} mothod will be called 
> twice in {{createQuery}} , this is not necessary, and the 
> {{checkForStreaming}} method has a lot of statements, so it's better to 
> remove one of them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26300) The `checkForStreaming` mothod may be called twice in `createQuery`

2018-12-06 Thread liuxian (JIRA)
liuxian created SPARK-26300:
---

 Summary: The `checkForStreaming`  mothod  may be called twice in 
`createQuery`
 Key: SPARK-26300
 URL: https://issues.apache.org/jira/browse/SPARK-26300
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 2.4.0
Reporter: liuxian


If {{checkForContinuous}} is called ( {{checkForStreaming}} is called in 
{{checkForContinuous}} ), the {{checkForStreaming}} mothod will be called twice 
in {{createQuery}} , this is not necessary, and the {{checkForStreaming}} 
method has a lot of statements, so it's better to remove one of them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26263) Throw exception when Partition column value can't be converted to user specified type

2018-12-06 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-26263:
---

Assignee: Gengliang Wang

> Throw exception when Partition column value can't be converted to user 
> specified type
> -
>
> Key: SPARK-26263
> URL: https://issues.apache.org/jira/browse/SPARK-26263
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently if user provides data schema, partition column values are converted 
> as per it. But if the conversion failed, e.g. converting string to int, the 
> column value is null.
> For the following directory
> /tmp/testDir
> ├── p=bar
> └── p=foo
> If we run:
> ```
> val schema = StructType(Seq(StructField("p", IntegerType, false)))
> spark.read.schema(schema).csv("/tmp/testDir/").show()
> ```
> We will get:
> ++
> |   p|
> ++
> |null|
> |null|
> ++
> This PR propose to throw exception in such case, instead of converting into 
> null value silently:
> 1. These null partition column values doesn't make sense to users in most 
> case. It is better to know the conversion failure, and then adjust the schema 
> or ETL jobs, etc to fix it.
> 2. There are always exceptions on such conversion failure for non-partition 
> data columns. Partition columns should have the same behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26263) Throw exception when Partition column value can't be converted to user specified type

2018-12-06 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-26263.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23215
[https://github.com/apache/spark/pull/23215]

> Throw exception when Partition column value can't be converted to user 
> specified type
> -
>
> Key: SPARK-26263
> URL: https://issues.apache.org/jira/browse/SPARK-26263
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently if user provides data schema, partition column values are converted 
> as per it. But if the conversion failed, e.g. converting string to int, the 
> column value is null.
> For the following directory
> /tmp/testDir
> ├── p=bar
> └── p=foo
> If we run:
> ```
> val schema = StructType(Seq(StructField("p", IntegerType, false)))
> spark.read.schema(schema).csv("/tmp/testDir/").show()
> ```
> We will get:
> ++
> |   p|
> ++
> |null|
> |null|
> ++
> This PR propose to throw exception in such case, instead of converting into 
> null value silently:
> 1. These null partition column values doesn't make sense to users in most 
> case. It is better to know the conversion failure, and then adjust the schema 
> or ETL jobs, etc to fix it.
> 2. There are always exceptions on such conversion failure for non-partition 
> data columns. Partition columns should have the same behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26051) Can't create table with column name '22222d'

2018-12-06 Thread Xie Juntao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712279#comment-16712279
 ] 

Xie Juntao commented on SPARK-26051:


[~dkbiswal] hi, I tested in mysql. It's ok for creating a table with column 
name "2d":

mysql> create table t1(2d int);
Query OK, 0 rows affected (0.02 sec)

> Can't create table with column name '2d'
> 
>
> Key: SPARK-26051
> URL: https://issues.apache.org/jira/browse/SPARK-26051
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Xie Juntao
>Priority: Minor
>
> I can't create table in which the column name is '2d' when I use 
> spark-sql. It seems a SQL parser bug because it's ok for creating table with 
> the column name ''2m".
> {code:java}
> spark-sql> create table t1(2d int);
> Error in query:
> no viable alternative at input 'create table t1(2d'(line 1, pos 16)
> == SQL ==
> create table t1(2d int)
> ^^^
> spark-sql> create table t1(2m int);
> 18/11/14 09:13:53 INFO HiveMetaStore: 0: get_database: global_temp
> 18/11/14 09:13:53 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: 
> global_temp
> 18/11/14 09:13:53 WARN ObjectStore: Failed to get database global_temp, 
> returning NoSuchObjectException
> 18/11/14 09:13:55 INFO HiveMetaStore: 0: get_database: default
> 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: 
> default
> 18/11/14 09:13:55 INFO HiveMetaStore: 0: get_database: default
> 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: 
> default
> 18/11/14 09:13:55 INFO HiveMetaStore: 0: get_table : db=default tbl=t1
> 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : 
> db=default tbl=t1
> 18/11/14 09:13:55 INFO HiveMetaStore: 0: get_database: default
> 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: 
> default
> 18/11/14 09:13:55 INFO HiveMetaStore: 0: create_table: Table(tableName:t1, 
> dbName:default, owner:root, createTime:1542158033, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:2m, type:int, 
> comment:null)], 
> location:file:/opt/UQuery/spark_/spark-2.3.1-bin-hadoop2.7/spark-warehouse/t1,
>  inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
> parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
> skewedColValueLocationMaps:{})), partitionKeys:[], 
> parameters:{spark.sql.sources.schema.part.0={"type":"struct","fields":[{"name":"2m","type":"integer","nullable":true,"metadata":{}}]},
>  spark.sql.sources.schema.numParts=1, spark.sql.create.version=2.3.1}, 
> viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, 
> privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, 
> rolePrivileges:null))
> 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=create_table: 
> Table(tableName:t1, dbName:default, owner:root, createTime:1542158033, 
> lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:2m, type:int, comment:null)], 
> location:file:/opt/UQuery/spark_/spark-2.3.1-bin-hadoop2.7/spark-warehouse/t1,
>  inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
> parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
> skewedColValueLocationMaps:{})), partitionKeys:[], 
> parameters:{spark.sql.sources.schema.part.0={"type":"struct","fields":[{"name":"2m","type":"integer","nullable":true,"metadata":{}}]},
>  spark.sql.sources.schema.numParts=1, spark.sql.create.version=2.3.1}, 
> viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, 
> privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, 
> rolePrivileges:null))
> 18/11/14 09:13:55 WARN HiveMetaStore: Location: 
> file:/opt/UQuery/spark_/spark-2.3.1-bin-hadoop2.7/spark-warehouse/t1 
> specified for non-external table:t1
> 18/11/14 09:13:55 INFO FileUtils: Creating directory if it doesn't exist: 
> file:/opt/UQuery/spark_/spark-2.3.1-bin-hadoop2.7/spark-warehouse/t1
> Time taken: 2.15 seconds
> 18/11/14 09:13:56 INFO SparkSQLCLIDriver: Time taken: 2.15 seconds{code}



--
This message was sent by Atlassian JIRA

[jira] [Comment Edited] (SPARK-26051) Can't create table with column name '22222d'

2018-12-06 Thread Xie Juntao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712279#comment-16712279
 ] 

Xie Juntao edited comment on SPARK-26051 at 12/7/18 2:53 AM:
-

[~dkbiswal] hi, I tested in mysql. It's ok for creating a table with column 
name "2d":
{quote}mysql> create table t1(2d int);
 Query OK, 0 rows affected (0.02 sec)
{quote}


was (Author: xiejuntao1...@163.com):
[~dkbiswal] hi, I tested in mysql. It's ok for creating a table with column 
name "2d":

mysql> create table t1(2d int);
Query OK, 0 rows affected (0.02 sec)

> Can't create table with column name '2d'
> 
>
> Key: SPARK-26051
> URL: https://issues.apache.org/jira/browse/SPARK-26051
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Xie Juntao
>Priority: Minor
>
> I can't create table in which the column name is '2d' when I use 
> spark-sql. It seems a SQL parser bug because it's ok for creating table with 
> the column name ''2m".
> {code:java}
> spark-sql> create table t1(2d int);
> Error in query:
> no viable alternative at input 'create table t1(2d'(line 1, pos 16)
> == SQL ==
> create table t1(2d int)
> ^^^
> spark-sql> create table t1(2m int);
> 18/11/14 09:13:53 INFO HiveMetaStore: 0: get_database: global_temp
> 18/11/14 09:13:53 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: 
> global_temp
> 18/11/14 09:13:53 WARN ObjectStore: Failed to get database global_temp, 
> returning NoSuchObjectException
> 18/11/14 09:13:55 INFO HiveMetaStore: 0: get_database: default
> 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: 
> default
> 18/11/14 09:13:55 INFO HiveMetaStore: 0: get_database: default
> 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: 
> default
> 18/11/14 09:13:55 INFO HiveMetaStore: 0: get_table : db=default tbl=t1
> 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : 
> db=default tbl=t1
> 18/11/14 09:13:55 INFO HiveMetaStore: 0: get_database: default
> 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: 
> default
> 18/11/14 09:13:55 INFO HiveMetaStore: 0: create_table: Table(tableName:t1, 
> dbName:default, owner:root, createTime:1542158033, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:2m, type:int, 
> comment:null)], 
> location:file:/opt/UQuery/spark_/spark-2.3.1-bin-hadoop2.7/spark-warehouse/t1,
>  inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
> parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
> skewedColValueLocationMaps:{})), partitionKeys:[], 
> parameters:{spark.sql.sources.schema.part.0={"type":"struct","fields":[{"name":"2m","type":"integer","nullable":true,"metadata":{}}]},
>  spark.sql.sources.schema.numParts=1, spark.sql.create.version=2.3.1}, 
> viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, 
> privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, 
> rolePrivileges:null))
> 18/11/14 09:13:55 INFO audit: ugi=root ip=unknown-ip-addr cmd=create_table: 
> Table(tableName:t1, dbName:default, owner:root, createTime:1542158033, 
> lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:2m, type:int, comment:null)], 
> location:file:/opt/UQuery/spark_/spark-2.3.1-bin-hadoop2.7/spark-warehouse/t1,
>  inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
> parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
> skewedColValueLocationMaps:{})), partitionKeys:[], 
> parameters:{spark.sql.sources.schema.part.0={"type":"struct","fields":[{"name":"2m","type":"integer","nullable":true,"metadata":{}}]},
>  spark.sql.sources.schema.numParts=1, spark.sql.create.version=2.3.1}, 
> viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, 
> privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, 
> rolePrivileges:null))
> 18/11/14 09:13:55 WARN HiveMetaStore: Location: 
> file:/opt/UQuery/spark_/spark-2.3.1-bin-hadoop2.7/spark-warehouse/t1 
> specified for non-external table:t1
> 18/11/14 09:13:55 INFO FileUtils: 

[jira] [Updated] (SPARK-26299) [MAVEN] Shaded test jar in spark-streaming cause deploy test jar twice

2018-12-06 Thread Liu, Linhong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu, Linhong updated SPARK-26299:
-
Description: 
In spark root pom, there is a test-jar execution for all sub-projects including 
spark-streaming. This will attach an artifact: 
org.apache.spark:spark-streaming_2.11:*{color:#ff}test-jar{color}*:tests:2.3.2

Also, in streaming pom, there is a shade test configuration, it will attach an 
artifact: 
org.apache.spark:spark-streaming_2.11:{color:#ff}*jar*{color}:tests:2.3.2

But two artifacts actually point to the same file: 
spark-streaming_2.11-2.3.2-tests.jar

So, when deploy spark to nexus using mvn deploy, maven will upload the test.jar 
twice since it belongs to 2 artifacts. Then deploy fails due to nexus don't 
allow overrides existing file for non-SNAPSHOT release.

What's more, after checking the spark-streaming, shaded test jar is exactly 
same to original test jar. It seems we can just delete the related shade config.
{code:java}

  
target/scala-${scala.binary.version}/classes
  
target/scala-${scala.binary.version}/test-classes
  

  org.apache.maven.plugins
  maven-shade-plugin
  
true
  

  
{code}
 

  was:
In spark root pom, there is a test-jar execution for all sub-projects including 
spark-streaming. This will attach an artifact: 
org.apache.spark:spark-streaming_2.11:*{color:#ff}test-jar{color}*:tests:2.3.2

Also, in streaming pom, there is a shade test configuration, it will attach an 
artifact: 
org.apache.spark:spark-streaming_2.11:{color:#ff}*jar*{color}:tests:2.3.2

But two artifacts actually point to the same file: 
spark-streaming_2.11-2.3.2-tests.jar

So, when deploy spark to nexus using mvn deploy, maven will upload the test.jar 
twice since it belongs to 2 artifacts. Then deploy fails due to nexus don't 
allow overrides existing file for non-SNAPSHOT release.

What's more, after checking the spark-streaming, shaded test jar is exactly 
same to original test jar. It seems we can just delete the related shade config.

```


 target/scala-${scala.binary.version}/classes
 
target/scala-${scala.binary.version}/test-classes
 
 
 org.apache.maven.plugins
 maven-shade-plugin
 
 true
 
 
 
 

```


> [MAVEN] Shaded test jar in spark-streaming cause deploy test jar twice
> --
>
> Key: SPARK-26299
> URL: https://issues.apache.org/jira/browse/SPARK-26299
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Liu, Linhong
>Priority: Major
>
> In spark root pom, there is a test-jar execution for all sub-projects 
> including spark-streaming. This will attach an artifact: 
> org.apache.spark:spark-streaming_2.11:*{color:#ff}test-jar{color}*:tests:2.3.2
> Also, in streaming pom, there is a shade test configuration, it will attach 
> an artifact: 
> org.apache.spark:spark-streaming_2.11:{color:#ff}*jar*{color}:tests:2.3.2
> But two artifacts actually point to the same file: 
> spark-streaming_2.11-2.3.2-tests.jar
> So, when deploy spark to nexus using mvn deploy, maven will upload the 
> test.jar twice since it belongs to 2 artifacts. Then deploy fails due to 
> nexus don't allow overrides existing file for non-SNAPSHOT release.
> What's more, after checking the spark-streaming, shaded test jar is exactly 
> same to original test jar. It seems we can just delete the related shade 
> config.
> {code:java}
> 
>   
> target/scala-${scala.binary.version}/classes
>   
> target/scala-${scala.binary.version}/test-classes
>   
> 
>   org.apache.maven.plugins
>   maven-shade-plugin
>   
> true
>   
> 
>   
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26299) [MAVEN] Shaded test jar in spark-streaming cause deploy test jar twice

2018-12-06 Thread Liu, Linhong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu, Linhong updated SPARK-26299:
-
Description: 
In spark root pom, there is a test-jar execution for all sub-projects including 
spark-streaming. This will attach an artifact: 
org.apache.spark:spark-streaming_2.11:*{color:#ff}test-jar{color}*:tests:2.3.2

Also, in streaming pom, there is a shade test configuration, it will attach an 
artifact: 
org.apache.spark:spark-streaming_2.11:{color:#ff}*jar*{color}:tests:2.3.2

But two artifacts actually point to the same file: 
spark-streaming_2.11-2.3.2-tests.jar

So, when deploy spark to nexus using mvn deploy, maven will upload the test.jar 
twice since it belongs to 2 artifacts. Then deploy fails due to nexus don't 
allow overrides existing file for non-SNAPSHOT release.

What's more, after checking the spark-streaming, shaded test jar is exactly 
same to original test jar. It seems we can just delete the related shade config.

```


 target/scala-${scala.binary.version}/classes
 
target/scala-${scala.binary.version}/test-classes
 
 
 org.apache.maven.plugins
 maven-shade-plugin
 
 true
 
 
 
 

```

  was:
In spark root pom, there is a test-jar execution for all sub-projects including 
spark-streaming. This will attach an artifact: 
org.apache.spark:spark-streaming_2.11:*{color:#FF}test-jar{color}*:tests:2.3.2

Also, in streaming pom, there is a shade test configuration, it will attach an 
artifact: 
org.apache.spark:spark-streaming_2.11:{color:#FF}*jar*{color}:tests:2.3.2

But two artifacts actually point to the same file: 
spark-streaming_2.11-2.3.2-tests.jar

So, when deploy spark to nexus using mvn deploy, 


> [MAVEN] Shaded test jar in spark-streaming cause deploy test jar twice
> --
>
> Key: SPARK-26299
> URL: https://issues.apache.org/jira/browse/SPARK-26299
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Liu, Linhong
>Priority: Major
>
> In spark root pom, there is a test-jar execution for all sub-projects 
> including spark-streaming. This will attach an artifact: 
> org.apache.spark:spark-streaming_2.11:*{color:#ff}test-jar{color}*:tests:2.3.2
> Also, in streaming pom, there is a shade test configuration, it will attach 
> an artifact: 
> org.apache.spark:spark-streaming_2.11:{color:#ff}*jar*{color}:tests:2.3.2
> But two artifacts actually point to the same file: 
> spark-streaming_2.11-2.3.2-tests.jar
> So, when deploy spark to nexus using mvn deploy, maven will upload the 
> test.jar twice since it belongs to 2 artifacts. Then deploy fails due to 
> nexus don't allow overrides existing file for non-SNAPSHOT release.
> What's more, after checking the spark-streaming, shaded test jar is exactly 
> same to original test jar. It seems we can just delete the related shade 
> config.
> ```
> 
>  
> target/scala-${scala.binary.version}/classes
>  
> target/scala-${scala.binary.version}/test-classes
>  
>  
>  org.apache.maven.plugins
>  maven-shade-plugin
>  
>  true
>  
>  
>  
>  
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26299) [MAVEN] Shaded test jar in spark-streaming cause deploy test jar twice

2018-12-06 Thread Liu, Linhong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu, Linhong updated SPARK-26299:
-
Affects Version/s: 2.3.2

> [MAVEN] Shaded test jar in spark-streaming cause deploy test jar twice
> --
>
> Key: SPARK-26299
> URL: https://issues.apache.org/jira/browse/SPARK-26299
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Liu, Linhong
>Priority: Major
>
> In spark root pom, there is a test-jar execution for all sub-projects 
> including spark-streaming. This will attach an artifact: 
> org.apache.spark:spark-streaming_2.11:*{color:#FF}test-jar{color}*:tests:2.3.2
> Also, in streaming pom, there is a shade test configuration, it will attach 
> an artifact: 
> org.apache.spark:spark-streaming_2.11:{color:#FF}*jar*{color}:tests:2.3.2
> But two artifacts actually point to the same file: 
> spark-streaming_2.11-2.3.2-tests.jar
> So, when deploy spark to nexus using mvn deploy, 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26299) [MAVEN] Shaded test jar in spark-streaming cause deploy test jar twice

2018-12-06 Thread Liu, Linhong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu, Linhong updated SPARK-26299:
-
Description: 
In spark root pom, there is a test-jar execution for all sub-projects including 
spark-streaming. This will attach an artifact: 
org.apache.spark:spark-streaming_2.11:*{color:#FF}test-jar{color}*:tests:2.3.2

Also, in streaming pom, there is a shade test configuration, it will attach an 
artifact: 
org.apache.spark:spark-streaming_2.11:{color:#FF}*jar*{color}:tests:2.3.2

But two artifacts actually point to the same file: 
spark-streaming_2.11-2.3.2-tests.jar

So, when deploy spark to nexus using mvn deploy, 

> [MAVEN] Shaded test jar in spark-streaming cause deploy test jar twice
> --
>
> Key: SPARK-26299
> URL: https://issues.apache.org/jira/browse/SPARK-26299
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Liu, Linhong
>Priority: Major
>
> In spark root pom, there is a test-jar execution for all sub-projects 
> including spark-streaming. This will attach an artifact: 
> org.apache.spark:spark-streaming_2.11:*{color:#FF}test-jar{color}*:tests:2.3.2
> Also, in streaming pom, there is a shade test configuration, it will attach 
> an artifact: 
> org.apache.spark:spark-streaming_2.11:{color:#FF}*jar*{color}:tests:2.3.2
> But two artifacts actually point to the same file: 
> spark-streaming_2.11-2.3.2-tests.jar
> So, when deploy spark to nexus using mvn deploy, 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26299) [MAVEN] Shaded test jar in spark-streaming cause deploy test jar twice

2018-12-06 Thread Liu, Linhong (JIRA)
Liu, Linhong created SPARK-26299:


 Summary: [MAVEN] Shaded test jar in spark-streaming cause deploy 
test jar twice
 Key: SPARK-26299
 URL: https://issues.apache.org/jira/browse/SPARK-26299
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 2.4.0
Reporter: Liu, Linhong






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26289) cleanup enablePerfMetrics parameter from BytesToBytesMap

2018-12-06 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-26289.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23244
[https://github.com/apache/spark/pull/23244]

> cleanup enablePerfMetrics parameter from BytesToBytesMap
> 
>
> Key: SPARK-26289
> URL: https://issues.apache.org/jira/browse/SPARK-26289
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.5.0
>Reporter: caoxuewen
>Assignee: caoxuewen
>Priority: Major
> Fix For: 3.0.0
>
>
> enablePerfMetrics was originally designed in BytesToBytesMap to control 
> getNumHashCollisions getTimeSpentResizingNs getAverageProbesPerLookup.
> However, as the Spark version gradual progress. This parameter is only used 
> for getAverageProbesPerLookup and always given to true when using 
> BytesToBytesMap. it is also dangerous to determine whether 
> getAverageProbesPerLookup opens and throws an IllegalStateException 
> exception. So this pr will be remove enablePerfMetrics parameter from 
> BytesToBytesMap. thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26289) cleanup enablePerfMetrics parameter from BytesToBytesMap

2018-12-06 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-26289:
---

Assignee: caoxuewen

> cleanup enablePerfMetrics parameter from BytesToBytesMap
> 
>
> Key: SPARK-26289
> URL: https://issues.apache.org/jira/browse/SPARK-26289
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.5.0
>Reporter: caoxuewen
>Assignee: caoxuewen
>Priority: Major
> Fix For: 3.0.0
>
>
> enablePerfMetrics was originally designed in BytesToBytesMap to control 
> getNumHashCollisions getTimeSpentResizingNs getAverageProbesPerLookup.
> However, as the Spark version gradual progress. This parameter is only used 
> for getAverageProbesPerLookup and always given to true when using 
> BytesToBytesMap. it is also dangerous to determine whether 
> getAverageProbesPerLookup opens and throws an IllegalStateException 
> exception. So this pr will be remove enablePerfMetrics parameter from 
> BytesToBytesMap. thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26295) [K8S] serviceAccountName is not set in client mode

2018-12-06 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712192#comment-16712192
 ] 

Marcelo Vanzin commented on SPARK-26295:


Can't SPARK-25887 be used for this?

> [K8S] serviceAccountName is not set in client mode
> --
>
> Key: SPARK-26295
> URL: https://issues.apache.org/jira/browse/SPARK-26295
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Adrian Tanase
>Priority: Major
>
> When deploying spark apps in client mode (in my case from inside the driver 
> pod), one can't specify the service account in accordance to the docs 
> ([https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac).]
> The property {{spark.kubernetes.authenticate.driver.serviceAccountName}} is 
> most likely added in cluster mode only, which would be consistent with 
> spark.kubernetes.authenticate.driver being the cluster mode prefix.
> We should either inject the service account specified by this property in the 
> client mode pods, or specify an equivalent config: 
> spark.kubernetes.authenticate.serviceAccountName
>  This is the exception:
> {noformat}
> Message: Forbidden!Configured service account doesn't have access. Service 
> account may have been revoked. pods "..." is forbidden: User 
> "system:serviceaccount:mynamespace:default" cannot get pods in the namespace 
> "mynamespace"{noformat}
> The expectation was to see the user `mynamespace:spark` based on my submit 
> command.
> My current workaround is to create a clusterrolebinding with edit rights for 
> the mynamespace:default account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2018-12-06 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712116#comment-16712116
 ] 

Dongjoon Hyun commented on SPARK-25987:
---

It's not directly related to this, but it seems that we had better upgrade 
Janino, too. I filed SPARK-26298.

> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0, 3.0.0
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26298) Upgrade Janino version to 3.0.11

2018-12-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26298:


Assignee: Apache Spark

> Upgrade Janino version to 3.0.11
> 
>
> Key: SPARK-26298
> URL: https://issues.apache.org/jira/browse/SPARK-26298
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> This issue aims to upgrade Janino compiler to [version 
> 3.0.11|http://janino-compiler.github.io/janino/changelog.html].
> - Fixed issue #63: Script with many "helper" variables.
> - At last, the "jdk-9" implementation works.
> - Made the code that is required for JDK 9+ (ModuleFinder and consorts) 
> Java-6-compilable through reflection, so that commons-compiler-jdk can still 
> be compiled with Java 6, but functions at runtime also with Java 9+.
> - Fixed Java 9+ compatibility (JRE module system).
> - Fixed issue #65: Compilation Error Messages Generated by JDK.
> - Added experimental support for the "StackMapFrame" attribute; not active 
> yet.
> - Fixed issue #61: Make Unparser more flexible.
> - Fixed NPEs in various "toString()" methods.
> - Fixed issue #62: Optimize static method invocation with rvalue target 
> expression.
> - Merged pull request #60: Added all missing "ClassFile.getConstant*Info()" 
> methods, removing the necessity for many type casts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26298) Upgrade Janino version to 3.0.11

2018-12-06 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712089#comment-16712089
 ] 

Apache Spark commented on SPARK-26298:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/23250

> Upgrade Janino version to 3.0.11
> 
>
> Key: SPARK-26298
> URL: https://issues.apache.org/jira/browse/SPARK-26298
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> This issue aims to upgrade Janino compiler to [version 
> 3.0.11|http://janino-compiler.github.io/janino/changelog.html].
> - Fixed issue #63: Script with many "helper" variables.
> - At last, the "jdk-9" implementation works.
> - Made the code that is required for JDK 9+ (ModuleFinder and consorts) 
> Java-6-compilable through reflection, so that commons-compiler-jdk can still 
> be compiled with Java 6, but functions at runtime also with Java 9+.
> - Fixed Java 9+ compatibility (JRE module system).
> - Fixed issue #65: Compilation Error Messages Generated by JDK.
> - Added experimental support for the "StackMapFrame" attribute; not active 
> yet.
> - Fixed issue #61: Make Unparser more flexible.
> - Fixed NPEs in various "toString()" methods.
> - Fixed issue #62: Optimize static method invocation with rvalue target 
> expression.
> - Merged pull request #60: Added all missing "ClassFile.getConstant*Info()" 
> methods, removing the necessity for many type casts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26298) Upgrade Janino version to 3.0.11

2018-12-06 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712090#comment-16712090
 ] 

Apache Spark commented on SPARK-26298:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/23250

> Upgrade Janino version to 3.0.11
> 
>
> Key: SPARK-26298
> URL: https://issues.apache.org/jira/browse/SPARK-26298
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> This issue aims to upgrade Janino compiler to [version 
> 3.0.11|http://janino-compiler.github.io/janino/changelog.html].
> - Fixed issue #63: Script with many "helper" variables.
> - At last, the "jdk-9" implementation works.
> - Made the code that is required for JDK 9+ (ModuleFinder and consorts) 
> Java-6-compilable through reflection, so that commons-compiler-jdk can still 
> be compiled with Java 6, but functions at runtime also with Java 9+.
> - Fixed Java 9+ compatibility (JRE module system).
> - Fixed issue #65: Compilation Error Messages Generated by JDK.
> - Added experimental support for the "StackMapFrame" attribute; not active 
> yet.
> - Fixed issue #61: Make Unparser more flexible.
> - Fixed NPEs in various "toString()" methods.
> - Fixed issue #62: Optimize static method invocation with rvalue target 
> expression.
> - Merged pull request #60: Added all missing "ClassFile.getConstant*Info()" 
> methods, removing the necessity for many type casts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26298) Upgrade Janino version to 3.0.11

2018-12-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26298:


Assignee: (was: Apache Spark)

> Upgrade Janino version to 3.0.11
> 
>
> Key: SPARK-26298
> URL: https://issues.apache.org/jira/browse/SPARK-26298
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> This issue aims to upgrade Janino compiler to [version 
> 3.0.11|http://janino-compiler.github.io/janino/changelog.html].
> - Fixed issue #63: Script with many "helper" variables.
> - At last, the "jdk-9" implementation works.
> - Made the code that is required for JDK 9+ (ModuleFinder and consorts) 
> Java-6-compilable through reflection, so that commons-compiler-jdk can still 
> be compiled with Java 6, but functions at runtime also with Java 9+.
> - Fixed Java 9+ compatibility (JRE module system).
> - Fixed issue #65: Compilation Error Messages Generated by JDK.
> - Added experimental support for the "StackMapFrame" attribute; not active 
> yet.
> - Fixed issue #61: Make Unparser more flexible.
> - Fixed NPEs in various "toString()" methods.
> - Fixed issue #62: Optimize static method invocation with rvalue target 
> expression.
> - Merged pull request #60: Added all missing "ClassFile.getConstant*Info()" 
> methods, removing the necessity for many type casts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26194) Support automatic spark.authenticate secret in Kubernetes backend

2018-12-06 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-26194.

   Resolution: Fixed
Fix Version/s: 3.0.0

> Support automatic spark.authenticate secret in Kubernetes backend
> -
>
> Key: SPARK-26194
> URL: https://issues.apache.org/jira/browse/SPARK-26194
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently k8s inherits the default behavior for {{spark.authenticate}}, which 
> is that the user must provide an auth secret.
> k8s doesn't have that requirement and could instead generate its own unique 
> per-app secret, and propagate it to executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26298) Upgrade Janino version to 3.0.11

2018-12-06 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-26298:
-

 Summary: Upgrade Janino version to 3.0.11
 Key: SPARK-26298
 URL: https://issues.apache.org/jira/browse/SPARK-26298
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun


This issue aims to upgrade Janino compiler to [version 
3.0.11|http://janino-compiler.github.io/janino/changelog.html].

- Fixed issue #63: Script with many "helper" variables.
- At last, the "jdk-9" implementation works.
- Made the code that is required for JDK 9+ (ModuleFinder and consorts) 
Java-6-compilable through reflection, so that commons-compiler-jdk can still be 
compiled with Java 6, but functions at runtime also with Java 9+.
- Fixed Java 9+ compatibility (JRE module system).
- Fixed issue #65: Compilation Error Messages Generated by JDK.
- Added experimental support for the "StackMapFrame" attribute; not active yet.
- Fixed issue #61: Make Unparser more flexible.
- Fixed NPEs in various "toString()" methods.
- Fixed issue #62: Optimize static method invocation with rvalue target 
expression.
- Merged pull request #60: Added all missing "ClassFile.getConstant*Info()" 
methods, removing the necessity for many type casts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2018-12-06 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712067#comment-16712067
 ] 

Dongjoon Hyun edited comment on SPARK-25987 at 12/6/18 10:06 PM:
-

Hi, [~kiszk]. The situation is very similar with SPARK-22523 and this still 
happens in Spark 2.4.0 and master branch. Do we have an existing issue to track 
this?


was (Author: dongjoon):
Hi, [~kiszk]. The situation is very similar with SPARK-22523 and this still 
happens in Spark 2.4.0. Do we have an existing issue to track this?

> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0, 3.0.0
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25987:
--
Affects Version/s: 3.0.0

> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0, 3.0.0
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2018-12-06 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712067#comment-16712067
 ] 

Dongjoon Hyun commented on SPARK-25987:
---

Hi, [~kiszk]. The situation is very similar with SPARK-22523 and this still 
happens in Spark 2.4.0. Do we have an existing issue to track this?

> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25987:
--
Description: 
When I execute
{code:java}
import org.apache.spark.sql._
import org.apache.spark.sql.types._

val columnsCount = 100
val columns = (1 to columnsCount).map(i => s"col$i")
val initialData = (1 to columnsCount).map(i => s"val$i")

val df = spark.createDataFrame(
  rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
  schema = StructType(columns.map(StructField(_, StringType, true)))
)

val addSuffixUDF = udf(
  (str: String) => str + "_added"
)

implicit class DFOps(df: DataFrame) {
  def addSuffix() = {
df.select(columns.map(col =>
  addSuffixUDF(df(col)).as(col)
): _*)
  }
}

df
  .addSuffix()
  .addSuffix()
  .addSuffix()
  .show()
{code}
I get
{code:java}
An exception or error caused a run to abort.
java.lang.StackOverflowError
 at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
 at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
...
{code}
If I reduce columns number (to 10 for example) or do `addSuffix` only once - it 
works fine.

  was:
When I execute
{code:java}
val columnsCount = 100
val columns = (1 to columnsCount).map(i => s"col$i")
val initialData = (1 to columnsCount).map(i => s"val$i")

val df = sparkSession.createDataFrame(
  rowRDD = sparkSession.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
  schema = StructType(columns.map(StructField(_, StringType, true)))
)

val addSuffixUDF = udf(
  (str: String) => str + "_added"
)

implicit class DFOps(df: DataFrame) {
  def addSuffix() = {
df.select(columns.map(col =>
  addSuffixUDF(df(col)).as(col)
): _*)
  }
}

df
  .addSuffix()
  .addSuffix()
  .addSuffix()
  .show()
{code}
I get
{code:java}
An exception or error caused a run to abort.
java.lang.StackOverflowError
 at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
 at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
...
{code}
If I reduce columns number (to 10 for example) or do `addSuffix` only once - it 
works fine.


> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df
>   .addSuffix()
>   .addSuffix()
>   .addSuffix()
>   .show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25987:
--
Affects Version/s: 2.4.0

> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25987) StackOverflowError when executing many operations on a table with many columns

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25987:
--
Description: 
When I execute
{code:java}
import org.apache.spark.sql._
import org.apache.spark.sql.types._

val columnsCount = 100
val columns = (1 to columnsCount).map(i => s"col$i")
val initialData = (1 to columnsCount).map(i => s"val$i")

val df = spark.createDataFrame(
  rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
  schema = StructType(columns.map(StructField(_, StringType, true)))
)

val addSuffixUDF = udf(
  (str: String) => str + "_added"
)

implicit class DFOps(df: DataFrame) {
  def addSuffix() = {
df.select(columns.map(col =>
  addSuffixUDF(df(col)).as(col)
): _*)
  }
}

df.addSuffix().addSuffix().addSuffix().show()
{code}
I get
{code:java}
An exception or error caused a run to abort.
java.lang.StackOverflowError
 at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
 at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
...
{code}
If I reduce columns number (to 10 for example) or do `addSuffix` only once - it 
works fine.

  was:
When I execute
{code:java}
import org.apache.spark.sql._
import org.apache.spark.sql.types._

val columnsCount = 100
val columns = (1 to columnsCount).map(i => s"col$i")
val initialData = (1 to columnsCount).map(i => s"val$i")

val df = spark.createDataFrame(
  rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
  schema = StructType(columns.map(StructField(_, StringType, true)))
)

val addSuffixUDF = udf(
  (str: String) => str + "_added"
)

implicit class DFOps(df: DataFrame) {
  def addSuffix() = {
df.select(columns.map(col =>
  addSuffixUDF(df(col)).as(col)
): _*)
  }
}

df
  .addSuffix()
  .addSuffix()
  .addSuffix()
  .show()
{code}
I get
{code:java}
An exception or error caused a run to abort.
java.lang.StackOverflowError
 at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
 at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
...
{code}
If I reduce columns number (to 10 for example) or do `addSuffix` only once - it 
works fine.


> StackOverflowError when executing many operations on a table with many columns
> --
>
> Key: SPARK-25987
> URL: https://issues.apache.org/jira/browse/SPARK-25987
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0
> Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I execute
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val columnsCount = 100
> val columns = (1 to columnsCount).map(i => s"col$i")
> val initialData = (1 to columnsCount).map(i => s"val$i")
> val df = spark.createDataFrame(
>   rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
>   schema = StructType(columns.map(StructField(_, StringType, true)))
> )
> val addSuffixUDF = udf(
>   (str: String) => str + "_added"
> )
> implicit class DFOps(df: DataFrame) {
>   def addSuffix() = {
> df.select(columns.map(col =>
>   addSuffixUDF(df(col)).as(col)
> ): _*)
>   }
> }
> df.addSuffix().addSuffix().addSuffix().show()
> {code}
> I get
> {code:java}
> An exception or error caused a run to abort.
> java.lang.StackOverflowError
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
>  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
> ...
> {code}
> If I reduce columns number (to 10 for example) or do `addSuffix` only once - 
> it works fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22776) Increase default value of spark.sql.codegen.maxFields

2018-12-06 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711988#comment-16711988
 ] 

Dongjoon Hyun commented on SPARK-22776:
---

Hi, [~kiszk]. Is there anything to do here?

> Increase default value of spark.sql.codegen.maxFields
> -
>
> Key: SPARK-22776
> URL: https://issues.apache.org/jira/browse/SPARK-22776
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Priority: Major
>
> Since there are lots of effort to avoid limitation of Java class files, 
> generated code for whole-stage codegen works with wider columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-26067) Pandas GROUPED_MAP udf breaks if DF has >255 columns

2018-12-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-26067.
-

> Pandas GROUPED_MAP udf breaks if DF has >255 columns
> 
>
> Key: SPARK-26067
> URL: https://issues.apache.org/jira/browse/SPARK-26067
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Abdeali Kothari
>Priority: Major
>
> When I run spark's Pandas GROUPED_MAP udfs to apply a UDAF i wrote in 
> pythohn/pandas on a grouped dataframe in spark - it fails if the number of 
> columns is greater than 255 in Pytohn 3.6 and lower.
> {code:java}
> import pyspark
> from pyspark.sql import types as T, functions as F
> spark = pyspark.sql.SparkSession.builder.getOrCreate()
> df = spark.createDataFrame(
> [[i for i in range(256)], [i+1 for i in range(256)]], schema=["a" + 
> str(i) for i in range(256)])
> new_schema = T.StructType([
> field for field in df.schema] + [T.StructField("new_row", 
> T.DoubleType())])
> def myfunc(df):
> df['new_row'] = 1
> return df
> myfunc_udf = F.pandas_udf(new_schema, F.PandasUDFType.GROUPED_MAP)(myfunc)
> df2 = df.groupBy(["a1"]).apply(myfunc_udf)
> print(df2.count())  # This FAILS
> # ERROR:
> # Caused by: org.apache.spark.api.python.PythonException: Traceback (most 
> recent call last):
> #   File 
> "/usr/local/hadoop/spark2.3.1/python/lib/pyspark.zip/pyspark/worker.py", line 
> 219, in main
> # func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, 
> eval_type)
> #   File 
> "/usr/local/hadoop/spark2.3.1/python/lib/pyspark.zip/pyspark/worker.py", line 
> 148, in read_udfs
> # mapper = eval(mapper_str, udfs)
> #   File "", line 1
> # SyntaxError: more than 255 arguments
> {code}
> Note: In Python 3.7 the 255 limit was raised, but I have not tried with 
> Pytohn 3.7 
> ...https://docs.python.org/3.7/whatsnew/3.7.html#other-language-changes
> I was using Python 3.5 (from anaconda), Spark 2.3.1 to reproduce thihs on my 
> Hadoop Linux cluster and also on my Mac standalone spark installation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26286) Add MAXIMUM_PAGE_SIZE_BYTES Exception unit test

2018-12-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-26286:
--
Priority: Trivial  (was: Minor)

> Add MAXIMUM_PAGE_SIZE_BYTES Exception unit test
> ---
>
> Key: SPARK-26286
> URL: https://issues.apache.org/jira/browse/SPARK-26286
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.4.0
>Reporter: wangjiaochun
>Priority: Trivial
>
> Add max page size exception bounds checking test



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26292) Assert statement of currentPage may be not in right place

2018-12-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26292.
---
   Resolution: Not A Problem
Fix Version/s: (was: 2.4.0)

Please also dont' set Fix Version

> Assert  statement of currentPage may be not in right place
> --
>
> Key: SPARK-26292
> URL: https://issues.apache.org/jira/browse/SPARK-26292
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: wangjiaochun
>Priority: Minor
>
> The assert  statement of currentPage is not in right place,it should be add 
> to the back of 
> allocatePage in a function acquireNewPageIfNecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26294) Delete Unnecessary If statement

2018-12-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-26294:
--
Priority: Trivial  (was: Minor)

This is too trivial for a JIRA.

> Delete Unnecessary If statement
> ---
>
> Key: SPARK-26294
> URL: https://issues.apache.org/jira/browse/SPARK-26294
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: wangjiaochun
>Priority: Trivial
>
> Delete unnecessary If statement, because it Impossible execution when 
> records less than or equal to zero.it is only execution when records begin 
> zero.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26236) Kafka delegation token support documentation

2018-12-06 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-26236.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23195
[https://github.com/apache/spark/pull/23195]

> Kafka delegation token support documentation
> 
>
> Key: SPARK-26236
> URL: https://issues.apache.org/jira/browse/SPARK-26236
> Project: Spark
>  Issue Type: New Feature
>  Components: Documentation, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> Because SPARK-25501 merged to master now it's time to update the docs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26236) Kafka delegation token support documentation

2018-12-06 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-26236:
--

Assignee: Gabor Somogyi

> Kafka delegation token support documentation
> 
>
> Key: SPARK-26236
> URL: https://issues.apache.org/jira/browse/SPARK-26236
> Project: Spark
>  Issue Type: New Feature
>  Components: Documentation, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> Because SPARK-25501 merged to master now it's time to update the docs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26254) Move delegation token providers into a separate project

2018-12-06 Thread Gabor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711516#comment-16711516
 ] 

Gabor Somogyi edited comment on SPARK-26254 at 12/6/18 6:32 PM:


I've reached a state where tradeoff has to be made, so interested in opinions 
[~vanzin] [~ste...@apache.org]

I've created a project with token-providers name which is depending on core. 
With this successfully extracted all the nasty hive + kafka dependencies + all 
token providers are there. Then loaded the providers with ServiceLoader which 
also works fine. Finally reached a point where kafka-sql project expects couple 
of things from KafkaUtil which is in token-providers now. Here is the list of 
problems:
{noformat}
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala:31:
 object KafkaTokenUtil is not a member of package 
org.apache.spark.deploy.security
[error] import org.apache.spark.deploy.security.KafkaTokenUtil
[error]^
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala:25:
 object KafkaTokenUtil is not a member of package 
org.apache.spark.deploy.security
[error] import org.apache.spark.deploy.security.KafkaTokenUtil
[error]^
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala:32:
 not found: value KafkaTokenUtil
[error]   KafkaTokenUtil.TOKEN_SERVICE) != null
[error]   ^
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala:37:
 not found: value KafkaTokenUtil
[error]   KafkaTokenUtil.TOKEN_SERVICE)
[error]   ^
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala:566:
 not found: value KafkaTokenUtil
[error]   if (KafkaTokenUtil.isGlobalJaasConfigurationProvided) {
[error]   ^

+ all isTokenAvailable tests expects TOKEN_KIND, TOKEN_SERVICE + 
KafkaDelegationTokenIdentifier in KafkaSecurityHelperSuite
{noformat}

Here I see these possibilities:
* Hardcode TOKEN_KIND + TOKEN_SERVICE and duplicate 
isGlobalJaasConfigurationProvided => The drawback here is we can't really test 
whether the provider created token can be read in kafka-sql (we can actually 
but with hardcoded strings in both sides which makes it brittle)
* As we're loading providers with ServiceLoader the kafka related one can be 
moved to kafka-sql => The drawback is that providers spread around and this 
code can't really be reused in DStreams.
* Extract KafkaUtil to a kafka specific project => It's too heavyweight
* Add token-providers provided dependency to kafka-sql project => It's kinda' 
weird but at the moment looks the least problematic

Waiting on opinions...



was (Author: gsomogyi):
I've reached a state where tradeoff has to be made, so interested in opinions 
[~vanzin] [~ste...@apache.org]

I've created a project with token-providers name which is depending on core. 
With this successfully extracted all the nasty hive + kafka dependencies + all 
token providers are there. Then loaded the providers with ServiceLoader which 
also works fine. Finally reached a point where kafka-sql project expects couple 
of things from KafkaUtil which is in token-providers now. Here is the list of 
problems:
{noformat}
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala:31:
 object KafkaTokenUtil is not a member of package 
org.apache.spark.deploy.security
[error] import org.apache.spark.deploy.security.KafkaTokenUtil
[error]^
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala:25:
 object KafkaTokenUtil is not a member of package 
org.apache.spark.deploy.security
[error] import org.apache.spark.deploy.security.KafkaTokenUtil
[error]^
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala:32:
 not found: value KafkaTokenUtil
[error]   KafkaTokenUtil.TOKEN_SERVICE) != null
[error]   ^
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala:37:
 not found: value KafkaTokenUtil
[error]   KafkaTokenUtil.TOKEN_SERVICE)
[error]   ^
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala:566:
 not found: value KafkaTokenUtil
[error]   if (KafkaTokenUtil.isGlobalJaasConfigurationProvided) {
[error]   ^

+ all isTokenAvailable tests expects TOKEN_KIND, TOKEN_SERVICE + 
KafkaDelegationTokenIdentifier in 

[jira] [Resolved] (SPARK-26213) Custom Receiver for Structured streaming

2018-12-06 Thread Gabor Somogyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Somogyi resolved SPARK-26213.
---
Resolution: Information Provided

> Custom Receiver for Structured streaming
> 
>
> Key: SPARK-26213
> URL: https://issues.apache.org/jira/browse/SPARK-26213
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Aarthi
>Priority: Major
>
> Hi,
> I have implemented a Custom Receiver for a https/json data source by 
> implementing the Receievr abstract class as provided in the documentation 
> here [https://spark.apache.org/docs/latest//streaming-custom-receivers.html]
> This approach works on Spark streaming context  where the custom receiver 
> class is passed it receiverStream. However I would like the implement the 
> same for Structured streaming as each of the DStreams have a complex 
> structure and need to be joined with each other based on complex rules. 
> ([https://stackoverflow.com/questions/53449599/join-two-spark-dstreams-with-complex-nested-structure])
>  Structured streaming uses the Spark Session object that takes in 
> DataStreamReader which is a final class. Please advice on how to implement 
> the custom receiver for Strucutred Streaming. 
> Thanks,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26296) Base Spark over Java and not over Scala, offering Scala as an option over Spark

2018-12-06 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-26296.

Resolution: Won't Fix

Lack of newer JDK support is not only the fault of Scala. There's a lot of Java 
code that doesn't work on the new JDK either.

> Base Spark over Java and not over Scala, offering Scala as an option over 
> Spark
> ---
>
> Key: SPARK-26296
> URL: https://issues.apache.org/jira/browse/SPARK-26296
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: M. Le Bihan
>Priority: Minor
>
> I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
> believe those using _PySpark_ no more. 
>  
> But _Spark_ has been build over _Scala_ instead of plain _Java_ and this is a 
> cause of troubles, especially at the time of upgrading the JDK. We are 
> awaiting to use JDK 11 and [Scala is still lowering Spark in the previous 
> version of the 
> JDK|https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html].
> _Big Data_ programming shall not force developpers to get by with _Scala_ 
> when its not the language they have choosen.
>  
> Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ 
> without _Hadoop,_ would confort me : a cause of issues would disappear.
> Provide an optional _spark-scala_ artifact would be fine as those who do not 
> need it wouldn't download it and wouldn't be affected by it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25274) Improve toPandas with Arrow by sending out-of-order record batches

2018-12-06 Thread Bryan Cutler (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved SPARK-25274.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 22275
https://github.com/apache/spark/pull/22275

> Improve toPandas with Arrow by sending out-of-order record batches
> --
>
> Key: SPARK-25274
> URL: https://issues.apache.org/jira/browse/SPARK-25274
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
> Fix For: 3.0.0
>
>
> When executing {{toPandas}} with Arrow enabled, partitions that arrive in the 
> JVM out-of-order must be buffered before they can be send to Python. This 
> causes an excess of memory to be used in the driver JVM and increases the 
> time it takes to complete because data must sit in the JVM waiting for 
> preceding partitions to come in.
> This can be improved by sending out-of-order partitions to Python as soon as 
> they arrive in the JVM, followed by a list of indices so that Python can 
> assemble the data in the correct order. This way, data is not buffered at the 
> JVM and there is no waiting on particular partitions so performance will be 
> increased.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25274) Improve toPandas with Arrow by sending out-of-order record batches

2018-12-06 Thread Bryan Cutler (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned SPARK-25274:


Assignee: Bryan Cutler

> Improve toPandas with Arrow by sending out-of-order record batches
> --
>
> Key: SPARK-25274
> URL: https://issues.apache.org/jira/browse/SPARK-25274
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> When executing {{toPandas}} with Arrow enabled, partitions that arrive in the 
> JVM out-of-order must be buffered before they can be send to Python. This 
> causes an excess of memory to be used in the driver JVM and increases the 
> time it takes to complete because data must sit in the JVM waiting for 
> preceding partitions to come in.
> This can be improved by sending out-of-order partitions to Python as soon as 
> they arrive in the JVM, followed by a list of indices so that Python can 
> assemble the data in the correct order. This way, data is not buffered at the 
> JVM and there is no waiting on particular partitions so performance will be 
> increased.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26297) improve the doc of Distribution/Partitioning

2018-12-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26297:


Assignee: Apache Spark  (was: Wenchen Fan)

> improve the doc of Distribution/Partitioning
> 
>
> Key: SPARK-26297
> URL: https://issues.apache.org/jira/browse/SPARK-26297
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26297) improve the doc of Distribution/Partitioning

2018-12-06 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711628#comment-16711628
 ] 

Apache Spark commented on SPARK-26297:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/23249

> improve the doc of Distribution/Partitioning
> 
>
> Key: SPARK-26297
> URL: https://issues.apache.org/jira/browse/SPARK-26297
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26297) improve the doc of Distribution/Partitioning

2018-12-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26297:


Assignee: Wenchen Fan  (was: Apache Spark)

> improve the doc of Distribution/Partitioning
> 
>
> Key: SPARK-26297
> URL: https://issues.apache.org/jira/browse/SPARK-26297
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26297) improve the doc of Distribution/Partitioning

2018-12-06 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-26297:
---

 Summary: improve the doc of Distribution/Partitioning
 Key: SPARK-26297
 URL: https://issues.apache.org/jira/browse/SPARK-26297
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26244) Do not use case class as public API

2018-12-06 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26244.
---
Resolution: Done

> Do not use case class as public API
> ---
>
> Key: SPARK-26244
> URL: https://issues.apache.org/jira/browse/SPARK-26244
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: release-notes
>
> It's a bad idea to use case class as public API, as it has a very wide 
> surface. For example, the copy method, its fields, the companion object, etc.
> I don't think it's expect to expose so many stuff to end users, and usually 
> we only want to expose a few methods.
> We should use a pure trait as public API, and use case class as an 
> implementation, which should be private and hide from end users.
> Changing class to interface is not binary compatible(but source compatible), 
> so 3.0 is a good chance to do it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project

2018-12-06 Thread Gabor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711516#comment-16711516
 ] 

Gabor Somogyi commented on SPARK-26254:
---

I've reached a state where tradeoff has to be made, so interested in opinions 
[~vanzin] [~ste...@apache.org]

I've created a project with token-providers name which is depending on core. 
With this successfully extracted all the nasty hive + kafka dependencies + all 
token providers are there. Then loaded the providers with ServiceLoader which 
also works fine. Finally reached a point where kafka-sql project expects couple 
of things from KafkaUtil which is in token-providers now. Here is the list of 
problems:
{noformat}
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala:31:
 object KafkaTokenUtil is not a member of package 
org.apache.spark.deploy.security
[error] import org.apache.spark.deploy.security.KafkaTokenUtil
[error]^
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala:25:
 object KafkaTokenUtil is not a member of package 
org.apache.spark.deploy.security
[error] import org.apache.spark.deploy.security.KafkaTokenUtil
[error]^
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala:32:
 not found: value KafkaTokenUtil
[error]   KafkaTokenUtil.TOKEN_SERVICE) != null
[error]   ^
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala:37:
 not found: value KafkaTokenUtil
[error]   KafkaTokenUtil.TOKEN_SERVICE)
[error]   ^
[error] 
/Users/gaborsomogyi/spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala:566:
 not found: value KafkaTokenUtil
[error]   if (KafkaTokenUtil.isGlobalJaasConfigurationProvided) {
[error]   ^

+ all isTokenAvailable tests expects TOKEN_KIND, TOKEN_SERVICE + 
KafkaDelegationTokenIdentifier in KafkaSecurityHelperSuite
{noformat}

Here I see these possibilities:
* Hardcode TOKEN_KIND + TOKEN_SERVICE and duplicate 
isGlobalJaasConfigurationProvided => The drawback here is we can't really test 
whether the provider created token can be read in kafka-sql (we can actually 
but with hardcoded strings in both sides which makes it brittle)
* As we're loading providers with ServiceLoader the kafka related one can be 
moved to kafka-sql => The drawback is that providers spread around and this 
code can't really be reused in DStreams.
* Add token-providers provided dependency to kafka-sql project => It's kinda' 
weird but at the moment looks the least problematic

Waiting on opinions...


> Move delegation token providers into a separate project
> ---
>
> Key: SPARK-26254
> URL: https://issues.apache.org/jira/browse/SPARK-26254
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> There was a discussion in 
> [PR#22598|https://github.com/apache/spark/pull/22598] that there are several 
> provided dependencies inside core project which shouldn't be there (for ex. 
> hive and kafka). This jira is to solve this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26265) deadlock between TaskMemoryManager and BytesToBytesMap$MapIterator

2018-12-06 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711500#comment-16711500
 ] 

Hyukjin Kwon commented on SPARK-26265:
--

Thanks, [~qianhan], can you investigate and make a fix to Spark?

> deadlock between TaskMemoryManager and BytesToBytesMap$MapIterator
> --
>
> Key: SPARK-26265
> URL: https://issues.apache.org/jira/browse/SPARK-26265
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: qian han
>Priority: Major
>
> The application is running on a cluster with 72000 cores and 182000G mem.
> Enviroment:
> |spark.dynamicAllocation.minExecutors|5|
> |spark.dynamicAllocation.initialExecutors|30|
> |spark.dynamicAllocation.maxExecutors|400|
> |spark.executor.cores|4|
> |spark.executor.memory|20g|
>  
>   
> Stage description:
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:364)
>  org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:357) 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:193)
>  
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498) 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
>  org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) 
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) 
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>  
> jstack information as follow:
> Found one Java-level deadlock: = 
> "Thread-ScriptTransformation-Feed": waiting to lock monitor 
> 0x00e0cb18 (object 0x0002f1641538, a 
> org.apache.spark.memory.TaskMemoryManager), which is held by "Executor task 
> launch worker for task 18899" "Executor task launch worker for task 18899": 
> waiting to lock monitor 0x00e09788 (object 0x000302faa3b0, a 
> org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator), which is held by 
> "Thread-ScriptTransformation-Feed" Java stack information for the threads 
> listed above: === 
> "Thread-ScriptTransformation-Feed": at 
> org.apache.spark.memory.TaskMemoryManager.freePage(TaskMemoryManager.java:332)
>  - waiting to lock <0x0002f1641538> (a 
> org.apache.spark.memory.TaskMemoryManager) at 
> org.apache.spark.memory.MemoryConsumer.freePage(MemoryConsumer.java:130) at 
> org.apache.spark.unsafe.map.BytesToBytesMap.access$300(BytesToBytesMap.java:66)
>  at 
> org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator.advanceToNextPage(BytesToBytesMap.java:274)
>  - locked <0x000302faa3b0> (a 
> org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator) at 
> org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator.next(BytesToBytesMap.java:313)
>  at 
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap$1.next(UnsafeFixedWidthAggregationMap.java:173)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
>  Source) at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at 
> scala.collection.Iterator$class.foreach(Iterator.scala:893) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at 
> org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply$mcV$sp(ScriptTransformationExec.scala:281)
>  at 
> org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformationExec.scala:270)
>  at 
> org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformationExec.scala:270)
>  at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1995) at 
> org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.run(ScriptTransformationExec.scala:270)
>  "Executor task launch worker for task 18899": at 
> org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator.spill(BytesToBytesMap.java:345)
>  - 

[jira] [Reopened] (SPARK-26265) deadlock between TaskMemoryManager and BytesToBytesMap$MapIterator

2018-12-06 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-26265:
--

> deadlock between TaskMemoryManager and BytesToBytesMap$MapIterator
> --
>
> Key: SPARK-26265
> URL: https://issues.apache.org/jira/browse/SPARK-26265
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: qian han
>Priority: Major
>
> The application is running on a cluster with 72000 cores and 182000G mem.
> Enviroment:
> |spark.dynamicAllocation.minExecutors|5|
> |spark.dynamicAllocation.initialExecutors|30|
> |spark.dynamicAllocation.maxExecutors|400|
> |spark.executor.cores|4|
> |spark.executor.memory|20g|
>  
>   
> Stage description:
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:364)
>  org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:357) 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:193)
>  
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498) 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
>  org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) 
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) 
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>  
> jstack information as follow:
> Found one Java-level deadlock: = 
> "Thread-ScriptTransformation-Feed": waiting to lock monitor 
> 0x00e0cb18 (object 0x0002f1641538, a 
> org.apache.spark.memory.TaskMemoryManager), which is held by "Executor task 
> launch worker for task 18899" "Executor task launch worker for task 18899": 
> waiting to lock monitor 0x00e09788 (object 0x000302faa3b0, a 
> org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator), which is held by 
> "Thread-ScriptTransformation-Feed" Java stack information for the threads 
> listed above: === 
> "Thread-ScriptTransformation-Feed": at 
> org.apache.spark.memory.TaskMemoryManager.freePage(TaskMemoryManager.java:332)
>  - waiting to lock <0x0002f1641538> (a 
> org.apache.spark.memory.TaskMemoryManager) at 
> org.apache.spark.memory.MemoryConsumer.freePage(MemoryConsumer.java:130) at 
> org.apache.spark.unsafe.map.BytesToBytesMap.access$300(BytesToBytesMap.java:66)
>  at 
> org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator.advanceToNextPage(BytesToBytesMap.java:274)
>  - locked <0x000302faa3b0> (a 
> org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator) at 
> org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator.next(BytesToBytesMap.java:313)
>  at 
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap$1.next(UnsafeFixedWidthAggregationMap.java:173)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
>  Source) at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at 
> scala.collection.Iterator$class.foreach(Iterator.scala:893) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at 
> org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply$mcV$sp(ScriptTransformationExec.scala:281)
>  at 
> org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformationExec.scala:270)
>  at 
> org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformationExec.scala:270)
>  at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1995) at 
> org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.run(ScriptTransformationExec.scala:270)
>  "Executor task launch worker for task 18899": at 
> org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator.spill(BytesToBytesMap.java:345)
>  - waiting to lock <0x000302faa3b0> (a 
> org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator) at 
> 

[jira] [Updated] (SPARK-26288) add initRegisteredExecutorsDB in ExternalShuffleService

2018-12-06 Thread weixiuli (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-26288:
-
Description: 
As we all know that spark on Yarn uses DB to record RegisteredExecutors 
information which can be reloaded and used again when the 
ExternalShuffleService is restarted .

The RegisteredExecutors information can't be recorded both in the mode of 
spark's standalone and spark on k8s , which will cause the RegisteredExecutors 
information to be lost ,when the ExternalShuffleService is restarted.

To solve the problem above, a method is proposed and is committed .

  was:
As we all know that spark on Yarn uses DB to record RegisteredExecutors 
information, when the ExternalShuffleService restart and it can be reloaded, 
which will be used as well .

While neither spark's standalone nor spark on k8s can record it's 
RegisteredExecutors information by db or others ,so when ExternalShuffleService 
restart ,which RegisteredExecutors information will be lost,it is't what we 
looking forward to .

This commit add initRegisteredExecutorsDB which can be used either spark 
standalone or spark on k8s to record RegisteredExecutors information , when the 
ExternalShuffleService restart and it can be reloaded, which will be used as 
well .


> add initRegisteredExecutorsDB in ExternalShuffleService
> ---
>
> Key: SPARK-26288
> URL: https://issues.apache.org/jira/browse/SPARK-26288
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Shuffle
>Affects Versions: 2.4.0
>Reporter: weixiuli
>Priority: Major
> Fix For: 2.4.0
>
>
> As we all know that spark on Yarn uses DB to record RegisteredExecutors 
> information which can be reloaded and used again when the 
> ExternalShuffleService is restarted .
> The RegisteredExecutors information can't be recorded both in the mode of 
> spark's standalone and spark on k8s , which will cause the 
> RegisteredExecutors information to be lost ,when the ExternalShuffleService 
> is restarted.
> To solve the problem above, a method is proposed and is committed .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26296) Base Spark over Java and not over Scala, offering Scala as an option over Spark

2018-12-06 Thread M. Le Bihan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

M. Le Bihan updated SPARK-26296:

Description: 
I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
believe those using _PySpark_ no more. 

 

But _Spark_ has been build over _Scala_ instead of plain _Java_ and this is a 
cause of troubles, especially at the time of upgrading the JDK. We are awaiting 
to use JDK 11 and [Scala is still lowering Spark in the previous version of the 
JDK|https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html].

_Big Data_ programming shall not force developpers to get by with _Scala_ when 
its not the language they have choosen.

 

Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ without 
_Hadoop,_ would confort me : a cause of issues would disappear.

Provide an optional _spark-scala_ artifact would be fine as those who do not 
need it wouldn't download it and wouldn't be affected by it. 

  was:
I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
believe those using _PySpark_ no more. 

 

But _Spark_ has been build over _Scala_ instead of plain _Java_ and this is a 
cause of troubles, especially at the time of upgrading the JDK. We are awaiting 
to use JDK 11 and _Scala_ is still lowering _Spark_ in the previous version of 
the JDK.

_Big Data_ programming shall not force developpers to get by with _Scala_ when 
its not the language they have choosen.

 

Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ without 
_Hadoop,_ would confort me : a cause of issues would disappear.

Provide an optional _spark-scala_ artifact would be fine as those who do not 
need it wouldn't download it and wouldn't be affected by it. 


> Base Spark over Java and not over Scala, offering Scala as an option over 
> Spark
> ---
>
> Key: SPARK-26296
> URL: https://issues.apache.org/jira/browse/SPARK-26296
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: M. Le Bihan
>Priority: Minor
>
> I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
> believe those using _PySpark_ no more. 
>  
> But _Spark_ has been build over _Scala_ instead of plain _Java_ and this is a 
> cause of troubles, especially at the time of upgrading the JDK. We are 
> awaiting to use JDK 11 and [Scala is still lowering Spark in the previous 
> version of the 
> JDK|https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html].
> _Big Data_ programming shall not force developpers to get by with _Scala_ 
> when its not the language they have choosen.
>  
> Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ 
> without _Hadoop,_ would confort me : a cause of issues would disappear.
> Provide an optional _spark-scala_ artifact would be fine as those who do not 
> need it wouldn't download it and wouldn't be affected by it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26296) Spark build over Java and not over Scala, offering Scala as an option over Spark

2018-12-06 Thread M. Le Bihan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

M. Le Bihan updated SPARK-26296:

Description: 
I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
believe those using _PySpark_ no more. 

 

But _Spark_ has been build over _Scala_ instead of plain _Java_ and this is a 
cause of troubles, especially at the time of upgrading the JDK. We are awaiting 
to use JDK 11 and _Scala_ is still lowering _Spark_ in the previous version of 
the JDK.

_Big Data_ programming shall not force developpers to get by with _Scala_ when 
its not the language they have choosen.

 

Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ without 
_Hadoop,_ would confort me : a cause of issues would disappear.

Provide an optional _spark-scala_ artifact would be fine as those who do not 
need it wouldn't download it and wouldn't be affected by it. 

  was:
I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
believe those using _PySpark_ no more. 

 

But _Spark_ has been build over _Scala_ instead of plain _Java_ and this is a 
cause of troubles, especially at the time of upgrading the JDK. We are awaiting 
to use JDK 11 and _Scala_ is still lowering _Spark_ in the previous version of 
the JDK.

_Big Data_ programming shall not force developpers to get by with _Scala_ when 
its not the language they have choosen.

 

Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ without 
_Hadoop,_ would confort me : a cause of issues would disappear.

Provide an optional _spark-scala_ artifact would be fine as those that do not 
need it won't download it and wouldn't be affected by it. 


> Spark build over Java and not over Scala, offering Scala as an option over 
> Spark
> 
>
> Key: SPARK-26296
> URL: https://issues.apache.org/jira/browse/SPARK-26296
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: M. Le Bihan
>Priority: Minor
>
> I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
> believe those using _PySpark_ no more. 
>  
> But _Spark_ has been build over _Scala_ instead of plain _Java_ and this is a 
> cause of troubles, especially at the time of upgrading the JDK. We are 
> awaiting to use JDK 11 and _Scala_ is still lowering _Spark_ in the previous 
> version of the JDK.
> _Big Data_ programming shall not force developpers to get by with _Scala_ 
> when its not the language they have choosen.
>  
> Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ 
> without _Hadoop,_ would confort me : a cause of issues would disappear.
> Provide an optional _spark-scala_ artifact would be fine as those who do not 
> need it wouldn't download it and wouldn't be affected by it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26296) Spark build over Java and not over Scala, offering Scala as an option over Spark

2018-12-06 Thread M. Le Bihan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

M. Le Bihan updated SPARK-26296:

Description: 
I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
believe those using _PySpark_ no more. 

 

But _Spark_ as been build over _Scala_ instead of plain _Java_ and its a cause 
of troubles, especially at the time of upgrading the JDK. We are awaiting to 
use JDK 11 and _Scala_ is still lowering _Spark_ in the previous version of the 
JDK.

_Big Data_ programming shall not force developpers to get by with _Scala_ when 
its not the language they have choosen.

 

Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ without 
_Hadoop,_ would confort me : a cause of issues would disappear.

Provide an optional _spark-scala_ artifact would be fine as those that do not 
need it won't download it and wouldn't be affected by it. 

  was:
I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
believe those using _PySpark_ no more. 

 

But _Spark_ as been build over _Scala_ instead of plain _Java_ and its a cause 
of troubles, especially at the time of upgrading the JDK. We are awaiting to 
use JDK 11 and _Scala_ is still lowering _Spark_ in the previous version of the 
JDK.

_Big Data_ programming shall not force developpers to get by with _Scala_ when 
its not the language they have choosen.

 

Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ without 
_Hadoop,_ would confort me : a cause of issues would disappear.

Provide an optional _spark-scala_ artifact would be fine as those that do not 
need it won't download it. In the same move, you would to return to the 
generation of standard javadocs for Java classes documentation.

 


> Spark build over Java and not over Scala, offering Scala as an option over 
> Spark
> 
>
> Key: SPARK-26296
> URL: https://issues.apache.org/jira/browse/SPARK-26296
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: M. Le Bihan
>Priority: Minor
>
> I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
> believe those using _PySpark_ no more. 
>  
> But _Spark_ as been build over _Scala_ instead of plain _Java_ and its a 
> cause of troubles, especially at the time of upgrading the JDK. We are 
> awaiting to use JDK 11 and _Scala_ is still lowering _Spark_ in the previous 
> version of the JDK.
> _Big Data_ programming shall not force developpers to get by with _Scala_ 
> when its not the language they have choosen.
>  
> Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ 
> without _Hadoop,_ would confort me : a cause of issues would disappear.
> Provide an optional _spark-scala_ artifact would be fine as those that do not 
> need it won't download it and wouldn't be affected by it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26296) Spark build over Java and not over Scala, offering Scala as an option over Spark

2018-12-06 Thread M. Le Bihan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

M. Le Bihan updated SPARK-26296:

Description: 
I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
believe those using _PySpark_ no more. 

 

But _Spark_ as been build over _Scala_ instead of plain _Java_ and its a cause 
of troubles, especially at the time of upgrading the JDK. We are awaiting to 
use JDK 11 and _Scala_ is still lowering _Spark_ in the previous version of the 
JDK.

_Big Data_ programming shall not force developpers to get by with _Scala_ when 
its not the language they have choosen.

 

Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ without 
_Hadoop,_ would confort me : a cause of issues would disappear.

Provide an optional _spark-scala_ artifact would be fine as those that do not 
need it won't download it. In the same move, you would to return to the 
generation of standard javadocs for Java classes documentation.

 

  was:
I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
believe those using _PySpark_ no more. 

 

But _Spark_ as been build over _Scala_ instead of plain _Java_ and its a cause 
troubles, especially at the time of leveling JDK. We are awaiting to use JDK 11 
and _Scala_ is still lowering _Spark_ in the previous version of the JDK.

_Big Data_ programming shall not force developpers to get by with _Scala_ when 
its not the language they have choosen.

 

Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ without 
_Hadoop,_ would confort me : a cause of issues would disappear.

Provide an optional _spark-scala_ artifact would be fine as those that do not 
need it won't download it. In the same move, you would to return to the 
generation of standard javadocs for Java classes documentation.

 


> Spark build over Java and not over Scala, offering Scala as an option over 
> Spark
> 
>
> Key: SPARK-26296
> URL: https://issues.apache.org/jira/browse/SPARK-26296
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: M. Le Bihan
>Priority: Minor
>
> I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
> believe those using _PySpark_ no more. 
>  
> But _Spark_ as been build over _Scala_ instead of plain _Java_ and its a 
> cause of troubles, especially at the time of upgrading the JDK. We are 
> awaiting to use JDK 11 and _Scala_ is still lowering _Spark_ in the previous 
> version of the JDK.
> _Big Data_ programming shall not force developpers to get by with _Scala_ 
> when its not the language they have choosen.
>  
> Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ 
> without _Hadoop,_ would confort me : a cause of issues would disappear.
> Provide an optional _spark-scala_ artifact would be fine as those that do not 
> need it won't download it. In the same move, you would to return to the 
> generation of standard javadocs for Java classes documentation.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26296) Spark build over Java and not over Scala, offering Scala as an option over Spark

2018-12-06 Thread M. Le Bihan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

M. Le Bihan updated SPARK-26296:

Description: 
I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
believe those using _PySpark_ no more. 

 

But _Spark_ as been build over _Scala_ instead of plain _Java_ and its a cause 
troubles, especially at the time of leveling JDK. We are awaiting to use JDK 11 
and _Scala_ is still lowering _Spark_ in the previous version of the JDK.

_Big Data_ programming shall not force developpers to get by with _Scala_ when 
its not the language they have choosen.

 

Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ without 
_Hadoop,_ would confort me : a cause of issues would disappear.

Provide an optional _spark-scala_ artifact would be fine as those that do not 
need it won't download it. In the same move, you would to return to the 
generation of standard javadocs for Java classes documentation.

 

  was:
I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
believe those using _PySpark_ no more. 

 

But Spark as been build over _Scala_ instead of plain _Java_ and its a cause 
troubles, especially at the time of leveling JDK. We are awaiting to use JDK 11 
and _Scala_ is still lowering _Spark_ in the previous version of the JDK.



Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ without 
_Hadoop,_ would confort me : a cause of issues would disappear.


> Spark build over Java and not over Scala, offering Scala as an option over 
> Spark
> 
>
> Key: SPARK-26296
> URL: https://issues.apache.org/jira/browse/SPARK-26296
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: M. Le Bihan
>Priority: Minor
>
> I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
> believe those using _PySpark_ no more. 
>  
> But _Spark_ as been build over _Scala_ instead of plain _Java_ and its a 
> cause troubles, especially at the time of leveling JDK. We are awaiting to 
> use JDK 11 and _Scala_ is still lowering _Spark_ in the previous version of 
> the JDK.
> _Big Data_ programming shall not force developpers to get by with _Scala_ 
> when its not the language they have choosen.
>  
> Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ 
> without _Hadoop,_ would confort me : a cause of issues would disappear.
> Provide an optional _spark-scala_ artifact would be fine as those that do not 
> need it won't download it. In the same move, you would to return to the 
> generation of standard javadocs for Java classes documentation.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26296) Base Spark over Java and not over Scala, offering Scala as an option over Spark

2018-12-06 Thread M. Le Bihan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

M. Le Bihan updated SPARK-26296:

Summary: Base Spark over Java and not over Scala, offering Scala as an 
option over Spark  (was: Spark build over Java and not over Scala, offering 
Scala as an option over Spark)

> Base Spark over Java and not over Scala, offering Scala as an option over 
> Spark
> ---
>
> Key: SPARK-26296
> URL: https://issues.apache.org/jira/browse/SPARK-26296
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: M. Le Bihan
>Priority: Minor
>
> I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
> believe those using _PySpark_ no more. 
>  
> But _Spark_ has been build over _Scala_ instead of plain _Java_ and this is a 
> cause of troubles, especially at the time of upgrading the JDK. We are 
> awaiting to use JDK 11 and _Scala_ is still lowering _Spark_ in the previous 
> version of the JDK.
> _Big Data_ programming shall not force developpers to get by with _Scala_ 
> when its not the language they have choosen.
>  
> Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ 
> without _Hadoop,_ would confort me : a cause of issues would disappear.
> Provide an optional _spark-scala_ artifact would be fine as those who do not 
> need it wouldn't download it and wouldn't be affected by it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26295) [K8S] serviceAccountName is not set in client mode

2018-12-06 Thread Adrian Tanase (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711464#comment-16711464
 ] 

Adrian Tanase commented on SPARK-26295:
---

ping [~vanzin], 

> [K8S] serviceAccountName is not set in client mode
> --
>
> Key: SPARK-26295
> URL: https://issues.apache.org/jira/browse/SPARK-26295
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Adrian Tanase
>Priority: Major
>
> When deploying spark apps in client mode (in my case from inside the driver 
> pod), one can't specify the service account in accordance to the docs 
> ([https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac).]
> The property {{spark.kubernetes.authenticate.driver.serviceAccountName}} is 
> most likely added in cluster mode only, which would be consistent with 
> spark.kubernetes.authenticate.driver being the cluster mode prefix.
> We should either inject the service account specified by this property in the 
> client mode pods, or specify an equivalent config: 
> spark.kubernetes.authenticate.serviceAccountName
>  This is the exception:
> {noformat}
> Message: Forbidden!Configured service account doesn't have access. Service 
> account may have been revoked. pods "..." is forbidden: User 
> "system:serviceaccount:mynamespace:default" cannot get pods in the namespace 
> "mynamespace"{noformat}
> The expectation was to see the user `mynamespace:spark` based on my submit 
> command.
> My current workaround is to create a clusterrolebinding with edit rights for 
> the mynamespace:default account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26295) [K8S] serviceAccountName is not set in client mode

2018-12-06 Thread Adrian Tanase (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711462#comment-16711462
 ] 

Adrian Tanase commented on SPARK-26295:
---

I would be happy to give it a shot with a bit of guidance, as I can't easily 
figure out where this should slot in, given the many classes under the 
*org.apache.spark.deploy.k8s* package.

Also, I've just seen the docs calling out this configuration 
(*spark.kubernetes.authenticate.serviceAccountName*) but I don't think it's 
implemented, or we have a bug.

 

> [K8S] serviceAccountName is not set in client mode
> --
>
> Key: SPARK-26295
> URL: https://issues.apache.org/jira/browse/SPARK-26295
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Adrian Tanase
>Priority: Major
>
> When deploying spark apps in client mode (in my case from inside the driver 
> pod), one can't specify the service account in accordance to the docs 
> ([https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac).]
> The property {{spark.kubernetes.authenticate.driver.serviceAccountName}} is 
> most likely added in cluster mode only, which would be consistent with 
> spark.kubernetes.authenticate.driver being the cluster mode prefix.
> We should either inject the service account specified by this property in the 
> client mode pods, or specify an equivalent config: 
> spark.kubernetes.authenticate.serviceAccountName
>  This is the exception:
> {noformat}
> Message: Forbidden!Configured service account doesn't have access. Service 
> account may have been revoked. pods "..." is forbidden: User 
> "system:serviceaccount:mynamespace:default" cannot get pods in the namespace 
> "mynamespace"{noformat}
> The expectation was to see the user `mynamespace:spark` based on my submit 
> command.
> My current workaround is to create a clusterrolebinding with edit rights for 
> the mynamespace:default account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26296) Spark build over Java and not over Scala, offering Scala as an option over Spark

2018-12-06 Thread M. Le Bihan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

M. Le Bihan updated SPARK-26296:

Description: 
I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
believe those using _PySpark_ no more. 

 

But _Spark_ has been build over _Scala_ instead of plain _Java_ and this is a 
cause of troubles, especially at the time of upgrading the JDK. We are awaiting 
to use JDK 11 and _Scala_ is still lowering _Spark_ in the previous version of 
the JDK.

_Big Data_ programming shall not force developpers to get by with _Scala_ when 
its not the language they have choosen.

 

Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ without 
_Hadoop,_ would confort me : a cause of issues would disappear.

Provide an optional _spark-scala_ artifact would be fine as those that do not 
need it won't download it and wouldn't be affected by it. 

  was:
I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
believe those using _PySpark_ no more. 

 

But _Spark_ as been build over _Scala_ instead of plain _Java_ and its a cause 
of troubles, especially at the time of upgrading the JDK. We are awaiting to 
use JDK 11 and _Scala_ is still lowering _Spark_ in the previous version of the 
JDK.

_Big Data_ programming shall not force developpers to get by with _Scala_ when 
its not the language they have choosen.

 

Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ without 
_Hadoop,_ would confort me : a cause of issues would disappear.

Provide an optional _spark-scala_ artifact would be fine as those that do not 
need it won't download it and wouldn't be affected by it. 


> Spark build over Java and not over Scala, offering Scala as an option over 
> Spark
> 
>
> Key: SPARK-26296
> URL: https://issues.apache.org/jira/browse/SPARK-26296
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: M. Le Bihan
>Priority: Minor
>
> I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
> believe those using _PySpark_ no more. 
>  
> But _Spark_ has been build over _Scala_ instead of plain _Java_ and this is a 
> cause of troubles, especially at the time of upgrading the JDK. We are 
> awaiting to use JDK 11 and _Scala_ is still lowering _Spark_ in the previous 
> version of the JDK.
> _Big Data_ programming shall not force developpers to get by with _Scala_ 
> when its not the language they have choosen.
>  
> Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ 
> without _Hadoop,_ would confort me : a cause of issues would disappear.
> Provide an optional _spark-scala_ artifact would be fine as those that do not 
> need it won't download it and wouldn't be affected by it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26296) Spark build over Java and not over Scala, offering Scala as an option over Spark

2018-12-06 Thread M. Le Bihan (JIRA)
M. Le Bihan created SPARK-26296:
---

 Summary: Spark build over Java and not over Scala, offering Scala 
as an option over Spark
 Key: SPARK-26296
 URL: https://issues.apache.org/jira/browse/SPARK-26296
 Project: Spark
  Issue Type: Wish
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: M. Le Bihan


I am not using _Scala_ when I am programming _Spark_ but plain _Java_. I 
believe those using _PySpark_ no more. 

 

But Spark as been build over _Scala_ instead of plain _Java_ and its a cause 
troubles, especially at the time of leveling JDK. We are awaiting to use JDK 11 
and _Scala_ is still lowering _Spark_ in the previous version of the JDK.



Having a _Spark_ without _Scala_, like it is possible to have a _Spark_ without 
_Hadoop,_ would confort me : a cause of issues would disappear.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24417) Build and Run Spark on JDK11

2018-12-06 Thread M. Le Bihan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710091#comment-16710091
 ] 

M. Le Bihan edited comment on SPARK-24417 at 12/6/18 1:15 PM:
--

Hello, 

Unaware if the problem with the JDK 11, I used it with _Spark 2.3.x_ without 
troubles for months, calling most of the times _lookup()_ functions on RDDs.

But when I attempted a _collect()_, I had a failure (an 
_IllegalArgumentException_). I upgraded to _Spark 2.4.0_ and a message from a 
class in _org.apache.xbean_ explained: "_Unsupported minor major version 55._".

Is it a trouble coming from memory management or from _Scala_ language ?

If, eventually, _Spark 2.x_ cannot support _JDK 11_ and that we have to wait 
for _Spark 3.0,_ when this version is planned to be released ?

 

Regards,


was (Author: mlebihan):
Hello, 

Unaware if the problem with the JDK 11, I used it with _Spark 2.3.x_ without 
troubles for months, calling most of the times _lookup()_ functions on RDDs.

But when I attempted a _collect()_, I had a failure (an 
_IllegalArgumentException_). I upgraded to _Spark 2.4.0_ and a message from a 
class in _org.apache.xbean_ explained: "_Unsupported minor major version 55._".

Is it a trouble coming from memory management or from _Scala_ language ?

If, eventually, _Spark 2.x_ cannot support _JDK 11_ and that we have to wait 
for _Spark 3.0,_ when this version is planned to be released ?

 

Sorry if it's out of subject, but :

Will this next major version still be built over _Scala_ (meaning that it has 
to wait that _Scala_ project can follow _Java_ JDK versions) or only over 
_Java_, with _Scala_ offered as an independant option ?

Because it seems to me, who do not use _Scala_ for programming _Spark_ but 
plain _Java_ only, that _Scala_ is a cause of underlying troubles. Having a 
_Spark_ without _Scala_ like it is possible to have a _Spark_ without _Hadoop_ 
would confort me : a cause of issues would disappear.

 

Regards,

> Build and Run Spark on JDK11
> 
>
> Key: SPARK-24417
> URL: https://issues.apache.org/jira/browse/SPARK-24417
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Priority: Major
>
> This is an umbrella JIRA for Apache Spark to support JDK11
> As JDK8 is reaching EOL, and JDK9 and 10 are already end of life, per 
> community discussion, we will skip JDK9 and 10 to support JDK 11 directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26295) [K8S] serviceAccountName is not set in client mode

2018-12-06 Thread Adrian Tanase (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Tanase updated SPARK-26295:
--
Description: 
When deploying spark apps in client mode (in my case from inside the driver 
pod), one can't specify the service account in accordance to the docs 
([https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac).]

The property {{spark.kubernetes.authenticate.driver.serviceAccountName}} is 
most likely added in cluster mode only, which would be consistent with 
spark.kubernetes.authenticate.driver being the cluster mode prefix.

We should either inject the service account specified by this property in the 
client mode pods, or specify an equivalent config: 
spark.kubernetes.authenticate.serviceAccountName

 This is the exception:
{noformat}
Message: Forbidden!Configured service account doesn't have access. Service 
account may have been revoked. pods "..." is forbidden: User 
"system:serviceaccount:mynamespace:default" cannot get pods in the namespace 
"mynamespace"{noformat}
The expectation was to see the user `mynamespace:spark` based on my submit 
command.

My current workaround is to create a clusterrolebinding with edit rights for 
the mynamespace:default account.

  was:
When deploying spark apps in client mode (in my case from inside the driver 
pod), one can't specify the service account in accordance to the docs 
([https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac).]

The property {{spark.kubernetes.authenticate.driver.serviceAccountName}} is 
most likely added in cluster mode only, which would be consistent with 
spark.kubernetes.authenticate.driver being the cluster mode prefix.

We should either inject the service account specified by this property in the 
client mode pods, or specify an equivalent config: 
spark.kubernetes.authenticate.serviceAccountName

 This is the exception:
{noformat}
Message: Forbidden!Configured service account doesn't have access. Service 
account may have been revoked. pods "..." is forbidden: User 
"system:serviceaccount:mynamespace:default" cannot get pods in the namespace 
"mynamespace"{noformat}
My current workaround is to create a clusterrolebinding with edit rights for 
the mynamespace:default account.


> [K8S] serviceAccountName is not set in client mode
> --
>
> Key: SPARK-26295
> URL: https://issues.apache.org/jira/browse/SPARK-26295
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Adrian Tanase
>Priority: Major
>
> When deploying spark apps in client mode (in my case from inside the driver 
> pod), one can't specify the service account in accordance to the docs 
> ([https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac).]
> The property {{spark.kubernetes.authenticate.driver.serviceAccountName}} is 
> most likely added in cluster mode only, which would be consistent with 
> spark.kubernetes.authenticate.driver being the cluster mode prefix.
> We should either inject the service account specified by this property in the 
> client mode pods, or specify an equivalent config: 
> spark.kubernetes.authenticate.serviceAccountName
>  This is the exception:
> {noformat}
> Message: Forbidden!Configured service account doesn't have access. Service 
> account may have been revoked. pods "..." is forbidden: User 
> "system:serviceaccount:mynamespace:default" cannot get pods in the namespace 
> "mynamespace"{noformat}
> The expectation was to see the user `mynamespace:spark` based on my submit 
> command.
> My current workaround is to create a clusterrolebinding with edit rights for 
> the mynamespace:default account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26295) [K8S] serviceAccountName is not set in client mode

2018-12-06 Thread Adrian Tanase (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Tanase updated SPARK-26295:
--
Description: 
When deploying spark apps in client mode (in my case from inside the driver 
pod), one can't specify the service account in accordance to the docs 
([https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac).]

The property {{spark.kubernetes.authenticate.driver.serviceAccountName}} is 
most likely added in cluster mode only, which would be consistent with 
spark.kubernetes.authenticate.driver being the cluster mode prefix.

We should either inject the service account specified by this property in the 
client mode pods, or specify an equivalent config: 
spark.kubernetes.authenticate.serviceAccountName

 This is the exception:
{noformat}
Message: Forbidden!Configured service account doesn't have access. Service 
account may have been revoked. pods "..." is forbidden: User 
"system:serviceaccount:mynamespace:default" cannot get pods in the namespace 
"mynamespace"{noformat}
My current workaround is to create a clusterrolebinding with edit rights for 
the mynamespace:default account.

  was:
When deploying spark apps in client mode (in my case from inside the driver 
pod), one can't specify the service account in accordance to the docs 
([https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac).]

The property {{spark.kubernetes.authenticate.driver.serviceAccountName}} is 
most likely added in cluster mode only, which would be consistent with 
spark.kubernetes.authenticate.driver being the cluster mode prefix.

We should either inject the service account specified by this property in the 
client mode pods, or specify an equivalent config: 
spark.kubernetes.authenticate.serviceAccountName

 This is the exception:

{{Message: Forbidden!Configured service account doesn't have access. Service 
account may have been revoked. pods "..." is forbidden: User 
"system:serviceaccount:mynamespace:default" cannot get pods in the namespace 
"mynamespace"}}

My current workaround is to create a clusterrolebinding with edit rights for 
the mynamespace:default account.


> [K8S] serviceAccountName is not set in client mode
> --
>
> Key: SPARK-26295
> URL: https://issues.apache.org/jira/browse/SPARK-26295
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Adrian Tanase
>Priority: Major
>
> When deploying spark apps in client mode (in my case from inside the driver 
> pod), one can't specify the service account in accordance to the docs 
> ([https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac).]
> The property {{spark.kubernetes.authenticate.driver.serviceAccountName}} is 
> most likely added in cluster mode only, which would be consistent with 
> spark.kubernetes.authenticate.driver being the cluster mode prefix.
> We should either inject the service account specified by this property in the 
> client mode pods, or specify an equivalent config: 
> spark.kubernetes.authenticate.serviceAccountName
>  This is the exception:
> {noformat}
> Message: Forbidden!Configured service account doesn't have access. Service 
> account may have been revoked. pods "..." is forbidden: User 
> "system:serviceaccount:mynamespace:default" cannot get pods in the namespace 
> "mynamespace"{noformat}
> My current workaround is to create a clusterrolebinding with edit rights for 
> the mynamespace:default account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26295) [K8S] serviceAccountName is not set in client mode

2018-12-06 Thread Adrian Tanase (JIRA)
Adrian Tanase created SPARK-26295:
-

 Summary: [K8S] serviceAccountName is not set in client mode
 Key: SPARK-26295
 URL: https://issues.apache.org/jira/browse/SPARK-26295
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Adrian Tanase


When deploying spark apps in client mode (in my case from inside the driver 
pod), one can't specify the service account in accordance to the docs 
([https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac).]

The property {{spark.kubernetes.authenticate.driver.serviceAccountName}} is 
most likely added in cluster mode only, which would be consistent with 
spark.kubernetes.authenticate.driver being the cluster mode prefix.

We should either inject the service account specified by this property in the 
client mode pods, or specify an equivalent config: 
spark.kubernetes.authenticate.serviceAccountName

 This is the exception:

{{Message: Forbidden!Configured service account doesn't have access. Service 
account may have been revoked. pods "..." is forbidden: User 
"system:serviceaccount:mynamespace:default" cannot get pods in the namespace 
"mynamespace"}}

My current workaround is to create a clusterrolebinding with edit rights for 
the mynamespace:default account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26293) Cast exception when having python udf in subquery

2018-12-06 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711360#comment-16711360
 ] 

Apache Spark commented on SPARK-26293:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/23248

> Cast exception when having python udf in subquery
> -
>
> Key: SPARK-26293
> URL: https://issues.apache.org/jira/browse/SPARK-26293
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26293) Cast exception when having python udf in subquery

2018-12-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26293:


Assignee: Apache Spark  (was: Wenchen Fan)

> Cast exception when having python udf in subquery
> -
>
> Key: SPARK-26293
> URL: https://issues.apache.org/jira/browse/SPARK-26293
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26293) Cast exception when having python udf in subquery

2018-12-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26293:


Assignee: Wenchen Fan  (was: Apache Spark)

> Cast exception when having python udf in subquery
> -
>
> Key: SPARK-26293
> URL: https://issues.apache.org/jira/browse/SPARK-26293
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26294) Delete Unnecessary If statement

2018-12-06 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711341#comment-16711341
 ] 

Apache Spark commented on SPARK-26294:
--

User 'wangjiaochun' has created a pull request for this issue:
https://github.com/apache/spark/pull/23247

> Delete Unnecessary If statement
> ---
>
> Key: SPARK-26294
> URL: https://issues.apache.org/jira/browse/SPARK-26294
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: wangjiaochun
>Priority: Minor
>
> Delete unnecessary If statement, because it Impossible execution when 
> records less than or equal to zero.it is only execution when records begin 
> zero.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26294) Delete Unnecessary If statement

2018-12-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26294:


Assignee: Apache Spark

> Delete Unnecessary If statement
> ---
>
> Key: SPARK-26294
> URL: https://issues.apache.org/jira/browse/SPARK-26294
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: wangjiaochun
>Assignee: Apache Spark
>Priority: Minor
>
> Delete unnecessary If statement, because it Impossible execution when 
> records less than or equal to zero.it is only execution when records begin 
> zero.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26294) Delete Unnecessary If statement

2018-12-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26294:


Assignee: (was: Apache Spark)

> Delete Unnecessary If statement
> ---
>
> Key: SPARK-26294
> URL: https://issues.apache.org/jira/browse/SPARK-26294
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: wangjiaochun
>Priority: Minor
>
> Delete unnecessary If statement, because it Impossible execution when 
> records less than or equal to zero.it is only execution when records begin 
> zero.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26294) Delete Unnecessary If statement

2018-12-06 Thread wangjiaochun (JIRA)
wangjiaochun created SPARK-26294:


 Summary: Delete Unnecessary If statement
 Key: SPARK-26294
 URL: https://issues.apache.org/jira/browse/SPARK-26294
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: wangjiaochun


Delete unnecessary If statement, because it Impossible execution when 

records less than or equal to zero.it is only execution when records begin zero.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26292) Assert statement of currentPage may be not in right place

2018-12-06 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711309#comment-16711309
 ] 

Apache Spark commented on SPARK-26292:
--

User 'wangjiaochun' has created a pull request for this issue:
https://github.com/apache/spark/pull/23246

> Assert  statement of currentPage may be not in right place
> --
>
> Key: SPARK-26292
> URL: https://issues.apache.org/jira/browse/SPARK-26292
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: wangjiaochun
>Priority: Minor
> Fix For: 2.4.0
>
>
> The assert  statement of currentPage is not in right place,it should be add 
> to the back of 
> allocatePage in a function acquireNewPageIfNecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26292) Assert statement of currentPage may be not in right place

2018-12-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26292:


Assignee: (was: Apache Spark)

> Assert  statement of currentPage may be not in right place
> --
>
> Key: SPARK-26292
> URL: https://issues.apache.org/jira/browse/SPARK-26292
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: wangjiaochun
>Priority: Minor
> Fix For: 2.4.0
>
>
> The assert  statement of currentPage is not in right place,it should be add 
> to the back of 
> allocatePage in a function acquireNewPageIfNecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming

2018-12-06 Thread Vikram Agrawal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711319#comment-16711319
 ] 

Vikram Agrawal commented on SPARK-18165:


Hi [~danielil] - right now it is available at 
https://github.com/qubole/kinesis-sql

> Kinesis support in Structured Streaming
> ---
>
> Key: SPARK-18165
> URL: https://issues.apache.org/jira/browse/SPARK-18165
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Reporter: Lauren Moos
>Priority: Major
>
> Implement Kinesis based sources and sinks for Structured Streaming



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26292) Assert statement of currentPage may be not in right place

2018-12-06 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711317#comment-16711317
 ] 

Apache Spark commented on SPARK-26292:
--

User 'wangjiaochun' has created a pull request for this issue:
https://github.com/apache/spark/pull/23246

> Assert  statement of currentPage may be not in right place
> --
>
> Key: SPARK-26292
> URL: https://issues.apache.org/jira/browse/SPARK-26292
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: wangjiaochun
>Priority: Minor
> Fix For: 2.4.0
>
>
> The assert  statement of currentPage is not in right place,it should be add 
> to the back of 
> allocatePage in a function acquireNewPageIfNecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18147) Broken Spark SQL Codegen

2018-12-06 Thread Eli (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711316#comment-16711316
 ] 

Eli commented on SPARK-18147:
-

[~mgaido]  it work well on 2.3.2 .    It should be a good idea to upgrade to 
this stable version.Thanks

> Broken Spark SQL Codegen
> 
>
> Key: SPARK-18147
> URL: https://issues.apache.org/jira/browse/SPARK-18147
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: koert kuipers
>Assignee: Liang-Chi Hsieh
>Priority: Critical
> Fix For: 2.1.0
>
>
> this is me on purpose trying to break spark sql codegen to uncover potential 
> issues, by creating arbitrately complex data structures using primitives, 
> strings, basic collections (map, seq, option), tuples, and case classes.
> first example: nested case classes
> code:
> {noformat}
> class ComplexResultAgg[B: TypeTag, C: TypeTag](val zero: B, result: C) 
> extends Aggregator[Row, B, C] {
>   override def reduce(b: B, input: Row): B = b
>   override def merge(b1: B, b2: B): B = b1
>   override def finish(reduction: B): C = result
>   override def bufferEncoder: Encoder[B] = ExpressionEncoder[B]()
>   override def outputEncoder: Encoder[C] = ExpressionEncoder[C]()
> }
> case class Struct2(d: Double = 0.0, s1: Seq[Double] = Seq.empty, s2: 
> Seq[Long] = Seq.empty)
> case class Struct3(a: Struct2 = Struct2(), b: Struct2 = Struct2())
> val df1 = Seq(("a", "aa"), ("a", "aa"), ("b", "b"), ("b", null)).toDF("x", 
> "y").groupBy("x").agg(
>   new ComplexResultAgg("boo", Struct3()).toColumn
> )
> df1.printSchema
> df1.show
> {noformat}
> the result is:
> {noformat}
> [info]   Cause: java.util.concurrent.ExecutionException: java.lang.Exception: 
> failed to compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 33, Column 12: Expression "isNull1" is not an rvalue
> [info] /* 001 */ public java.lang.Object generate(Object[] references) {
> [info] /* 002 */   return new SpecificMutableProjection(references);
> [info] /* 003 */ }
> [info] /* 004 */
> [info] /* 005 */ class SpecificMutableProjection extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseMutableProjection {
> [info] /* 006 */
> [info] /* 007 */   private Object[] references;
> [info] /* 008 */   private MutableRow mutableRow;
> [info] /* 009 */   private Object[] values;
> [info] /* 010 */   private java.lang.String errMsg;
> [info] /* 011 */   private Object[] values1;
> [info] /* 012 */   private java.lang.String errMsg1;
> [info] /* 013 */   private boolean[] argIsNulls;
> [info] /* 014 */   private scala.collection.Seq argValue;
> [info] /* 015 */   private java.lang.String errMsg2;
> [info] /* 016 */   private boolean[] argIsNulls1;
> [info] /* 017 */   private scala.collection.Seq argValue1;
> [info] /* 018 */   private java.lang.String errMsg3;
> [info] /* 019 */   private java.lang.String errMsg4;
> [info] /* 020 */   private Object[] values2;
> [info] /* 021 */   private java.lang.String errMsg5;
> [info] /* 022 */   private boolean[] argIsNulls2;
> [info] /* 023 */   private scala.collection.Seq argValue2;
> [info] /* 024 */   private java.lang.String errMsg6;
> [info] /* 025 */   private boolean[] argIsNulls3;
> [info] /* 026 */   private scala.collection.Seq argValue3;
> [info] /* 027 */   private java.lang.String errMsg7;
> [info] /* 028 */   private boolean isNull_0;
> [info] /* 029 */   private InternalRow value_0;
> [info] /* 030 */
> [info] /* 031 */   private void apply_1(InternalRow i) {
> [info] /* 032 */
> [info] /* 033 */ if (isNull1) {
> [info] /* 034 */   throw new RuntimeException(errMsg3);
> [info] /* 035 */ }
> [info] /* 036 */
> [info] /* 037 */ boolean isNull24 = false;
> [info] /* 038 */ final com.tresata.spark.sql.Struct2 value24 = isNull24 ? 
> null : (com.tresata.spark.sql.Struct2) value1.a();
> [info] /* 039 */ isNull24 = value24 == null;
> [info] /* 040 */
> [info] /* 041 */ boolean isNull23 = isNull24;
> [info] /* 042 */ final scala.collection.Seq value23 = isNull23 ? null : 
> (scala.collection.Seq) value24.s2();
> [info] /* 043 */ isNull23 = value23 == null;
> [info] /* 044 */ argIsNulls1[0] = isNull23;
> [info] /* 045 */ argValue1 = value23;
> [info] /* 046 */
> [info] /* 047 */
> [info] /* 048 */
> [info] /* 049 */ boolean isNull22 = false;
> [info] /* 050 */ for (int idx = 0; idx < 1; idx++) {
> [info] /* 051 */   if (argIsNulls1[idx]) { isNull22 = true; break; }
> [info] /* 052 */ }
> [info] /* 053 */
> [info] /* 054 */ final ArrayData value22 = isNull22 ? null : new 
> org.apache.spark.sql.catalyst.util.GenericArrayData(argValue1);
> [info] /* 055 */ if (isNull22) {
> [info] /* 056 */   values1[2] = null;
> [info] /* 057 */ } else {
> [info] /* 058 */   

[jira] [Assigned] (SPARK-26292) Assert statement of currentPage may be not in right place

2018-12-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26292:


Assignee: Apache Spark

> Assert  statement of currentPage may be not in right place
> --
>
> Key: SPARK-26292
> URL: https://issues.apache.org/jira/browse/SPARK-26292
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: wangjiaochun
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 2.4.0
>
>
> The assert  statement of currentPage is not in right place,it should be add 
> to the back of 
> allocatePage in a function acquireNewPageIfNecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26293) Cast exception when having python udf in subquery

2018-12-06 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-26293:
---

 Summary: Cast exception when having python udf in subquery
 Key: SPARK-26293
 URL: https://issues.apache.org/jira/browse/SPARK-26293
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26292) Assert statement of currentPage may be not in right place

2018-12-06 Thread wangjiaochun (JIRA)
wangjiaochun created SPARK-26292:


 Summary: Assert  statement of currentPage may be not in right place
 Key: SPARK-26292
 URL: https://issues.apache.org/jira/browse/SPARK-26292
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: wangjiaochun
 Fix For: 2.4.0


The assert  statement of currentPage is not in right place,it should be add to 
the back of 

allocatePage in a function acquireNewPageIfNecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >