Re: Spark Utf 8 encoding

2018-11-09 Thread Sean Owen
That doesn't necessarily look like a Spark-related issue. Your terminal seems to be displaying the glyph with a question mark because the font lacks that symbol, maybe? On Fri, Nov 9, 2018 at 7:17 PM lsn24 wrote: > > Hello, > > Per the documentation default character encoding of spark is UTF-8.

Spark Utf 8 encoding

2018-11-09 Thread lsn24
Hello, Per the documentation default character encoding of spark is UTF-8. But when i try to read non ascii characters, spark tend to read it as question marks. What am I doing wrong ?. Below is my Syntax: val ds = spark.read.textFile("a .bz2 file from hdfs"); ds.show(); The string "KøBENHAVN"

Re: DataSourceV2 capability API

2018-11-09 Thread Reynold Xin
"If there is no way to report a feature (e.g., able to read missing as null) then there is no way for Spark to take advantage of it in the first place" Consider this (just a hypothetical scenario): We added "supports-decimal" in the future, because we see a lot of data sources don't support

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue
For that case, I think we would have a property that defines whether supports-decimal is assumed or checked with the capability. Wouldn't we have this problem no matter what the capability API is? If we used a trait to signal decimal support, then we would have to deal with sources that were

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue
Do you have an example in mind where we might add a capability and break old versions of data sources? These are really for being able to tell what features a data source has. If there is no way to report a feature (e.g., able to read missing as null) then there is no way for Spark to take

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue
Another solution to the decimal case is using the capability API: use a capability to signal that the table knows about `supports-decimal`. So before the decimal support check, it would check `table.isSupported("type-capabilities")`. On Fri, Nov 9, 2018 at 12:45 PM Ryan Blue wrote: > For that

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-09 Thread Bryan Cutler
Great work Hyukjin! I'm not too familiar with R, but I'll take a look at the PR. Bryan On Fri, Nov 9, 2018 at 9:19 AM Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > Thanks Hyukjin! Very cool results > > Shivaram > On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung > wrote: > > > >

[Structured Streaming] Kafka group.id is fixed

2018-11-09 Thread Anastasios Zouzias
Hi all, I run in the following situation with Spark Structure Streaming (SS) using Kafka. In a project that I work on, there is already a secured Kafka setup where ops can issue an SSL certificate per "group.id", which should be predefined (or hopefully its prefix to be predefined). On the

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-09 Thread purna pradeep
Thanks this is a great news Can you please lemme if dynamic resource allocation is available in spark 2.4? I’m using spark 2.3.2 on Kubernetes, do I still need to provide executor memory options as part of spark submit command or spark will manage required executor memory based on the spark job

Can Spark avoid Container killed by Yarn?

2018-11-09 Thread Yang Zhang
I'm always suffering Spark SQL job fails with error "Container exited with a non-zero exit code 143". I know that it was casused by the memory used execeeds the limits of spark.yarn.executor.memoryOverhead. As shown below, memory allocation request was failed at 18/11/08 17:36:05, then it

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-09 Thread Felix Cheung
Very cool! From: Hyukjin Kwon Sent: Thursday, November 8, 2018 10:29 AM To: dev Subject: Arrow optimization in conversion from R DataFrame to Spark DataFrame Hi all, I am trying to introduce R Arrow optimization by reusing PySpark Arrow optimization. It

Re: DataSourceV2 capability API

2018-11-09 Thread Felix Cheung
One question is where will the list of capability strings be defined? From: Ryan Blue Sent: Thursday, November 8, 2018 2:09 PM To: Reynold Xin Cc: Spark Dev List Subject: Re: DataSourceV2 capability API Yes, we currently use traits that have methods. Something

Re: DataSourceV2 capability API

2018-11-09 Thread Reynold Xin
How do we deal with forward compatibility? Consider, Spark adds a new "property". In the past the data source supports that property, but since it was not explicitly defined, in the new version of Spark that data source would be considered not supporting that property, and thus throwing an

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue
I'd have two places. First, a class that defines properties supported and identified by Spark, like the SQLConf definitions. Second, in documentation for the v2 table API. On Fri, Nov 9, 2018 at 9:00 AM Felix Cheung wrote: > One question is where will the list of capability strings be defined?

Re: Behavior of SaveMode.Append when table is not present

2018-11-09 Thread Ryan Blue
Right now, it is up to the source implementation to decide what to do. I think path-based tables (with no metastore component) treat an append as an implicit create. If you're thinking that relying on sources to interpret SaveMode is bad for consistent behavior, I agree. That's why the community

Re: [Structured Streaming] Kafka group.id is fixed

2018-11-09 Thread Cody Koeninger
That sounds reasonable to me On Fri, Nov 9, 2018 at 2:26 AM Anastasios Zouzias wrote: > > Hi all, > > I run in the following situation with Spark Structure Streaming (SS) using > Kafka. > > In a project that I work on, there is already a secured Kafka setup where ops > can issue an SSL

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-09 Thread Shivaram Venkataraman
Thanks Hyukjin! Very cool results Shivaram On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung wrote: > > Very cool! > > > > From: Hyukjin Kwon > Sent: Thursday, November 8, 2018 10:29 AM > To: dev > Subject: Arrow optimization in conversion from R DataFrame to Spark