date:20200323

Re: [DISCUSS] Supporting hive on DataSourceV2

2020-03-23 Thread Ryan Blue

Hi Jacky,

We’ve internally released support for Hive tables (and Spark FileFormat
tables) using DataSourceV2 so that we can switch between catalogs; sounds
like that’s what you are planning to build as well. It would be great to
work with the broader community on a Hive connector.

I will get a branch of our connectors published so that you can take a
look. I think it should be fairly close to what you’re talking about
building, with a few exceptions:

   - Our implementation always uses our S3 committers, but it should be
   easy to change this
   - It supports per-partition formats, like Hive

Do you have an idea about where the connector should be developed? I don’t
think it makes sense for it to be part of Spark. That would keep complexity
in the main project and require updating Hive versions slowly. Using a
separate project would mean less code in Spark specific to one source, and
could more easily support multiple Hive versions. Maybe we should create a
project for catalog plug-ins?

rb

On Mon, Mar 23, 2020 at 4:20 AM JackyLee  wrote:

> Hi devs,
> I’d like to start a discussion about Supporting Hive on DatasourceV2. We’re
> now working on a project using DataSourceV2 to provide multiple source
> support and it works with the data lake solution very well, yet it does not
> yet support HiveTable.
>
> There are 3 reasons why we need to support Hive on DataSourceV2.
> 1. Hive itself is one of Spark data sources.
> 2. HiveTable is essentially a FileTable with its own input and output
> formats, it works fine with FileTable.
> 3. HiveTable should be stateless, and users can freely read or write Hive
> using batch or microbatch.
>
> We implemented stateless Hive on DataSourceV1, it supports user to write
> into Hive on streaming or batch and it has widely used in our company.
> Recently, we are trying to support Hive on DataSourceV2, Multiple Hive
> Catalog and DDL Commands have already been supported.
>
> Looking forward to more discussions on this.
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: \r\n in csv output

2020-03-23 Thread Vipul Rajan

You can use newAPIHadoopFile

import org.apache.hadoop.io.LongWritable
import org.apache.hadoop.io.Text
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat
val conf = new Configuration

conf.set("textinputformat.record.delimiter", "\r\n")

val df = sc.newAPIHadoopFile("path/to/file", classOf[TextInputFormat],
classOf[LongWritable], classOf[Text], conf).map(_._2.toString).toDF()

You would get a dataframe with just a single string column. You'd have to
split that column and make it into columnar format, it can be done. If you
need help feel free to ping back.

Regards

On Tue, Mar 24, 2020 at 1:23 AM Steven Parkes  wrote:

> SPARK-26108  / PR#23080
>  added a require on
> CSVOptions#lineSeparator to be a single character.
>
> AFAICT, this keeps us from writing CSV files with \r\n line terminators.
>
> Wondering if this was intended or a bug? Is there an alternative mechanism
> or something else I'm missing?
>

Re: \r\n in csv output

2020-03-23 Thread Steven Parkes

Hrm ... looks like we were setting this in the past although it looks like
it was being ignored ...

On Mon, Mar 23, 2020 at 12:53 PM Steven Parkes 
wrote:

> SPARK-26108  / PR#23080
>  added a require on
> CSVOptions#lineSeparator to be a single character.
>
> AFAICT, this keeps us from writing CSV files with \r\n line terminators.
>
> Wondering if this was intended or a bug? Is there an alternative mechanism
> or something else I'm missing?
>

\r\n in csv output

2020-03-23 Thread Steven Parkes

SPARK-26108  / PR#23080
 added a require on
CSVOptions#lineSeparator to be a single character.

AFAICT, this keeps us from writing CSV files with \r\n line terminators.

Wondering if this was intended or a bug? Is there an alternative mechanism
or something else I'm missing?

Re: Spark Thrift Server java vm problem need help

2020-03-23 Thread Sean Owen

No, as I say, it seems to just generate a warning. OOPS can't be used with
>= 32GB heap, so it just isn't. That's why I am asking what the problem is.
Spark doesn't set this value as far as I can tell; maybe your env does.
This is in any event not a Spark issue per se.

On Mon, Mar 23, 2020 at 9:40 AM angers.zhu  wrote:

> If -Xmx is bigger then 32g, vm will not to use  UseCompressedOops as
> default,
> We can see a case,
> If we set spark.driver.memory is 64g, set -XX:+UseCompressedOops in
> spark.executor.extralJavaOptions, and set SPARK_DAEMON_MEMORY = 6g,
> Use current code , vm will got command like with  -Xmx6g and 
> -XX:+UseCompressedOops
> , then vm will be -XX:+UseCompressedOops  and use Oops compressed
>
> But since we set spark.driver.memory=64g, our jvm’s max heap size will be
> 64g,  but we will use compressed Oops ,  Wouldn't that be a problem？
>

Re: Spark Thrift Server java vm problem need help

2020-03-23 Thread angers . zhu







If -Xmx is bigger then 32g, vm will not to use  UseCompressedOops as default, We can see a case, If we set spark.driver.memory is 64g, set -XX:+UseCompressedOops in spark.executor.extralJavaOptions, and set SPARK_DAEMON_MEMORY = 6g, Use current code , vm will got command like with  -Xmx6g and -XX:+UseCompressedOops , then vm will be -XX:+UseCompressedOops  and use Oops compressedBut since we set spark.driver.memory=64g, our jvm’s max heap size will be 64g,  but we will use compressed Oops ,  Wouldn't that be a problem？






  










angers.zhu




angers@gmail.com








签名由
网易邮箱大师
定制

 


On 03/23/2020 22:32，Sean Owen wrote： 


I'm still not sure if you are trying to enable it or disable it, and what the issue is?There is no logic in Spark that sets or disables this flag that I can see.On Mon, Mar 23, 2020 at 9:27 AM angers.zhu  wrote:







Hi Sean,
Yea,  I set  -XX:+UseCompressedOops in driver(you can see in command line) and these days, we have more user and I set spark.driver.memory to 64g, in Non-default VM flags it should be +XX:-UseCompressdOops , but it’s still +XX:-UseCompressdOops. I have find the reason , in SparkSubmitCommandBuilder.buildSparkSubmitCommand, have logic like belowif (isClientMode) {  // Figuring out where the memory value come from is a little tricky due to precedence.  // Precedence is observed in the following order:  // - explicit configuration (setConf()), which also covers --driver-memory cli argument.  // - properties file.  // - SPARK_DRIVER_MEMORY env variable  // - SPARK_MEM env variable  // - default value (1g)  // Take Thrift Server as daemon  String tsMemory =isThriftServer(mainClass) ? System.getenv("SPARK_DAEMON_MEMORY") : null;  String memory = firstNonEmpty(tsMemory, config.get(SparkLauncher.DRIVER_MEMORY),System.getenv("SPARK_DRIVER_MEMORY"), System.getenv("SPARK_MEM"), DEFAULT_MEM);  cmd.add("-Xmx" + memory);  addOptionString(cmd, driverDefaultJavaOptions);  addOptionString(cmd, driverExtraJavaOptions);  mergeEnvPathList(env, getLibPathEnvName(),config.get(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH));}For Spark Thrift Server, use SPARK_DAEMON_MEMORY First, it’s really reasonable, I am confused, if spark.driver.memory is bigger then 32gAnd   SPARK_DAEMON_MEMORY is less then 32g,  UseCompressedOops will also be open, it’s right?If we need to modify this logic for case >32g.By the way, I meet problem like https://issues.apache.org/jira/browse/SPARK-27097, caused by these strange case.Thanks






  










angers.zhu




angers@gmail.com








签名由
网易邮箱大师
定制

 


On 03/23/2020 21:43，Sean Owen wrote： 


I don't think Spark sets UseCompressedOops in any defaults; are you setting it?It can't be used with heaps >= 32GB. It doesn't seem to cause an error if you set it with large heaps, just a warning.What's the problem?On Mon, Mar 23, 2020 at 6:21 AM angers.zhu  wrote:








Hi developers, These day I meet a strange problem and I can’t find whyWhen I start a spark thrift server with  spark.driver.memory 64g, then use jdk8/bin/jinfo pid to see vm flags got below information,In 64g vm, UseCompressedOops should be closed by default, why spark thrift server is -XX: +UseCompressedOopsNon-default VM flags: -XX:CICompilerCount=15 -XX:-CMSClassUnloadingEnabled -XX:CMSFullGCsBeforeCompaction=0 -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSParallelRemarkEnabled -XX:-ClassUnloading -XX:+DisableExplicitGC -XX:ErrorFile=null -XX:-ExplicitGCInvokesConcurrentAndUnloadsClasses -XX:InitialHeapSize=2116026368 -XX:+ManagementServer -XX:MaxDirectMemorySize=8589934592 -XX:MaxHeapSize=6442450944 -XX:MaxNewSize=2147483648 -XX:MaxTenuringThreshold=6 -XX:MinHeapDeltaBytes=196608 -XX:NewSize=705298432 -XX:OldPLABSize=16 -XX:OldSize=1410727936 -XX:+PrintGC -XX:+PrintGCDateStamps

Re: Spark Thrift Server java vm problem need help

2020-03-23 Thread Sean Owen

I'm still not sure if you are trying to enable it or disable it, and what
the issue is?
There is no logic in Spark that sets or disables this flag that I can see.

On Mon, Mar 23, 2020 at 9:27 AM angers.zhu  wrote:

> Hi Sean,
>
> Yea,  I set  -XX:+UseCompressedOops in driver(you can see in command
> line) and these days, we have more user and I set
> spark.driver.memory to 64g, in Non-default VM flags it should be
> +XX:-UseCompressdOops , but it’s still
> +XX:-UseCompressdOops.
>
> I have find the reason , in SparkSubmitCommandBuilder.buildSparkSubmitCommand,
> have logic like below
>
> if (isClientMode) {
>   // Figuring out where the memory value come from is a little tricky due to 
> precedence.
>   // Precedence is observed in the following order:
>   // - explicit configuration (setConf()), which also covers --driver-memory 
> cli argument.
>   // - properties file.
>   // - SPARK_DRIVER_MEMORY env variable
>   // - SPARK_MEM env variable
>   // - default value (1g)
>   // Take Thrift Server as daemon
>   String tsMemory =
> isThriftServer(mainClass) ? System.getenv("SPARK_DAEMON_MEMORY") : null;
>   String memory = firstNonEmpty(tsMemory, 
> config.get(SparkLauncher.DRIVER_MEMORY),
> System.getenv("SPARK_DRIVER_MEMORY"), System.getenv("SPARK_MEM"), 
> DEFAULT_MEM);
>   cmd.add("-Xmx" + memory);
>   addOptionString(cmd, driverDefaultJavaOptions);
>   addOptionString(cmd, driverExtraJavaOptions);
>   mergeEnvPathList(env, getLibPathEnvName(),
> config.get(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH));
> }
>
>
> For Spark Thrift Server, use SPARK_DAEMON_MEMORY First, it’s really
> reasonable, I am confused, if spark.driver.memory is bigger then 32g
> And SPARK_DAEMON_MEMORY is less then 32g, UseCompressedOops will also be
> open, it’s right?
>
> If we need to modify this logic for case >32g.
>
>
> By the way, I meet problem like
> https://issues.apache.org/jira/browse/SPARK-27097, caused by these
> strange case.
>
> Thanks
>
>
> angers.zhu
> angers@gmail.com
>
> 
> 签名由 网易邮箱大师  定制
>
> On 03/23/2020 21:43，Sean Owen  wrote：
>
> I don't think Spark sets UseCompressedOops in any defaults; are you
> setting it?
> It can't be used with heaps >= 32GB. It doesn't seem to cause an error if
> you set it with large heaps, just a warning.
> What's the problem?
>
> On Mon, Mar 23, 2020 at 6:21 AM angers.zhu  wrote:
>
>> Hi developers,
>>
>>  These day I meet a strange problem and I can’t find why
>>
>> When I start a spark thrift server with  spark.driver.memory 64g, then
>> use jdk8/bin/jinfo pid to see vm flags got below information,
>> In 64g vm, UseCompressedOops should be closed by default, why spark
>> thrift server is -XX: +UseCompressedOops
>>
>> Non-default VM flags: -XX:CICompilerCount=15 -XX:-CMSClassUnloadingEnabled 
>> -XX:CMSFullGCsBeforeCompaction=0 -XX:CMSInitiatingOccupancyFraction=75 
>> -XX:+CMSParallelRemarkEnabled -XX:-ClassUnloading -XX:+DisableExplicitGC 
>> -XX:ErrorFile=null -XX:-ExplicitGCInvokesConcurrentAndUnloadsClasses 
>> -XX:InitialHeapSize=2116026368 -XX:+ManagementServer 
>> -XX:MaxDirectMemorySize=8589934592 -XX:MaxHeapSize=6442450944 
>> -XX:MaxNewSize=2147483648 -XX:MaxTenuringThreshold=6 
>> -XX:MinHeapDeltaBytes=196608 -XX:NewSize=705298432 -XX:OldPLABSize=16 
>> -XX:OldSize=1410727936 -XX:+PrintGC -XX:+PrintGCDateStamps 
>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
>> -XX:-TraceClassUnloading -XX:+UseCMSCompactAtFullCollection 
>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers 
>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC 
>> -XX:+UseFastUnorderedTimeStamps -XX:+UseParNewGCCommand line:  -Xmx6g 
>> -Djava.library.path=/home/hadoop/hadoop/lib/native 
>> -Djavax.security.auth.useSubjectCredsOnly=false 
>> -Dcom.sun.management.jmxremote.port=9021 
>> -Dcom.sun.management.jmxremote.authenticate=false 
>> -Dcom.sun.management.jmxremote.ssl=false -XX:MaxPermSize=1024m 
>> -XX:PermSize=256m -XX:MaxDirectMemorySize=8192m -XX:-TraceClassUnloading 
>> -XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
>> -XX:+CMSClassUnloadingEnabled -XX:+UseCMSCompactAtFullCollection 
>> -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSParallelRemarkEnabled 
>> -XX:+DisableExplicitGC -XX:+PrintTenuringDistribution 
>> -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75 
>> -Xnoclassgc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>>
>>
>> Since I am not a professor in VM, hope for some help
>>
>>
>> angers.zhu
>> angers@gmail.com
>>
>>

Re: Spark Thrift Server java vm problem need help

2020-03-23 Thread angers . zhu







Hi Sean,
Yea,  I set  -XX:+UseCompressedOops in driver(you can see in command line) and these days, we have more user and I set spark.driver.memory to 64g, in Non-default VM flags it should be +XX:-UseCompressdOops , but it’s still +XX:-UseCompressdOops. I have find the reason , in SparkSubmitCommandBuilder.buildSparkSubmitCommand, have logic like belowif (isClientMode) {  // Figuring out where the memory value come from is a little tricky due to precedence.  // Precedence is observed in the following order:  // - explicit configuration (setConf()), which also covers --driver-memory cli argument.  // - properties file.  // - SPARK_DRIVER_MEMORY env variable  // - SPARK_MEM env variable  // - default value (1g)  // Take Thrift Server as daemon  String tsMemory =isThriftServer(mainClass) ? System.getenv("SPARK_DAEMON_MEMORY") : null;  String memory = firstNonEmpty(tsMemory, config.get(SparkLauncher.DRIVER_MEMORY),System.getenv("SPARK_DRIVER_MEMORY"), System.getenv("SPARK_MEM"), DEFAULT_MEM);  cmd.add("-Xmx" + memory);  addOptionString(cmd, driverDefaultJavaOptions);  addOptionString(cmd, driverExtraJavaOptions);  mergeEnvPathList(env, getLibPathEnvName(),config.get(SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH));}For Spark Thrift Server, use SPARK_DAEMON_MEMORY First, it’s really reasonable, I am confused, if spark.driver.memory is bigger then 32gAnd   SPARK_DAEMON_MEMORY is less then 32g,  UseCompressedOops will also be open, it’s right?If we need to modify this logic for case >32g.By the way, I meet problem like https://issues.apache.org/jira/browse/SPARK-27097, caused by these strange case.Thanks






  










angers.zhu




angers@gmail.com








签名由
网易邮箱大师
定制

 


On 03/23/2020 21:43，Sean Owen wrote： 


I don't think Spark sets UseCompressedOops in any defaults; are you setting it?It can't be used with heaps >= 32GB. It doesn't seem to cause an error if you set it with large heaps, just a warning.What's the problem?On Mon, Mar 23, 2020 at 6:21 AM angers.zhu  wrote:








Hi developers, These day I meet a strange problem and I can’t find whyWhen I start a spark thrift server with  spark.driver.memory 64g, then use jdk8/bin/jinfo pid to see vm flags got below information,In 64g vm, UseCompressedOops should be closed by default, why spark thrift server is -XX: +UseCompressedOopsNon-default VM flags: -XX:CICompilerCount=15 -XX:-CMSClassUnloadingEnabled -XX:CMSFullGCsBeforeCompaction=0 -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSParallelRemarkEnabled -XX:-ClassUnloading -XX:+DisableExplicitGC -XX:ErrorFile=null -XX:-ExplicitGCInvokesConcurrentAndUnloadsClasses -XX:InitialHeapSize=2116026368 -XX:+ManagementServer -XX:MaxDirectMemorySize=8589934592 -XX:MaxHeapSize=6442450944 -XX:MaxNewSize=2147483648 -XX:MaxTenuringThreshold=6 -XX:MinHeapDeltaBytes=196608 -XX:NewSize=705298432 -XX:OldPLABSize=16 -XX:OldSize=1410727936 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:-TraceClassUnloading -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseFastUnorderedTimeStamps -XX:+UseParNewGC
Command line:  -Xmx6g -Djava.library.path=/home/hadoop/hadoop/lib/native -Djavax.security.auth.useSubjectCredsOnly=false -Dcom.sun.management.jmxremote.port=9021 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -XX:MaxPermSize=1024m -XX:PermSize=256m -XX:MaxDirectMemorySize=8192m -XX:-TraceClassUnloading -XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:+PrintTenuringDistribution -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75 -Xnoclassgc -XX:+PrintGCDetails -XX:+PrintGCDateStamps Since I am not a professor in VM, hope for some help






  










angers.zhu

Re: Spark Thrift Server java vm problem need help

2020-03-23 Thread Sean Owen

I don't think Spark sets UseCompressedOops in any defaults; are you setting
it?
It can't be used with heaps >= 32GB. It doesn't seem to cause an error if
you set it with large heaps, just a warning.
What's the problem?

On Mon, Mar 23, 2020 at 6:21 AM angers.zhu  wrote:

> Hi developers,
>
>  These day I meet a strange problem and I can’t find why
>
> When I start a spark thrift server with  spark.driver.memory 64g, then
> use jdk8/bin/jinfo pid to see vm flags got below information,
> In 64g vm, UseCompressedOops should be closed by default, why spark
> thrift server is -XX: +UseCompressedOops
>
> Non-default VM flags: -XX:CICompilerCount=15 -XX:-CMSClassUnloadingEnabled 
> -XX:CMSFullGCsBeforeCompaction=0 -XX:CMSInitiatingOccupancyFraction=75 
> -XX:+CMSParallelRemarkEnabled -XX:-ClassUnloading -XX:+DisableExplicitGC 
> -XX:ErrorFile=null -XX:-ExplicitGCInvokesConcurrentAndUnloadsClasses 
> -XX:InitialHeapSize=2116026368 -XX:+ManagementServer 
> -XX:MaxDirectMemorySize=8589934592 -XX:MaxHeapSize=6442450944 
> -XX:MaxNewSize=2147483648 -XX:MaxTenuringThreshold=6 
> -XX:MinHeapDeltaBytes=196608 -XX:NewSize=705298432 -XX:OldPLABSize=16 
> -XX:OldSize=1410727936 -XX:+PrintGC -XX:+PrintGCDateStamps 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
> -XX:-TraceClassUnloading -XX:+UseCMSCompactAtFullCollection 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers 
> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC 
> -XX:+UseFastUnorderedTimeStamps -XX:+UseParNewGCCommand line:  -Xmx6g 
> -Djava.library.path=/home/hadoop/hadoop/lib/native 
> -Djavax.security.auth.useSubjectCredsOnly=false 
> -Dcom.sun.management.jmxremote.port=9021 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dcom.sun.management.jmxremote.ssl=false -XX:MaxPermSize=1024m 
> -XX:PermSize=256m -XX:MaxDirectMemorySize=8192m -XX:-TraceClassUnloading 
> -XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
> -XX:+CMSClassUnloadingEnabled -XX:+UseCMSCompactAtFullCollection 
> -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSParallelRemarkEnabled 
> -XX:+DisableExplicitGC -XX:+PrintTenuringDistribution 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75 
> -Xnoclassgc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>
>
> Since I am not a professor in VM, hope for some help
>
>
> angers.zhu
> angers@gmail.com
>
> 
> 签名由 网易邮箱大师  定制
>
>

Partition by Custom Wrapping

2020-03-23 Thread nirmit jain

Hi Developer,

Can someone help me to write coustom partitioning, where instead of writing
data in the hierarchical format like this:

Root-
Data=A-
Data=B-

To something like this

Data-
A-
   ROOT-
B-
   ROOT-

If you could tell me what are the class that took care for such kind of
requirement.

Thanks and Regards
Nirmit Jain

Spark Thrift Server java vm problem need help

2020-03-23 Thread angers . zhu








Hi developers, These day I meet a strange problem and I can’t find whyWhen I start a spark thrift server with  spark.driver.memory 64g, then use jdk8/bin/jinfo pid to see vm flags got below information,In 64g vm, UseCompressedOops should be closed by default, why spark thrift server is -XX: +UseCompressedOopsNon-default VM flags: -XX:CICompilerCount=15 -XX:-CMSClassUnloadingEnabled -XX:CMSFullGCsBeforeCompaction=0 -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSParallelRemarkEnabled -XX:-ClassUnloading -XX:+DisableExplicitGC -XX:ErrorFile=null -XX:-ExplicitGCInvokesConcurrentAndUnloadsClasses -XX:InitialHeapSize=2116026368 -XX:+ManagementServer -XX:MaxDirectMemorySize=8589934592 -XX:MaxHeapSize=6442450944 -XX:MaxNewSize=2147483648 -XX:MaxTenuringThreshold=6 -XX:MinHeapDeltaBytes=196608 -XX:NewSize=705298432 -XX:OldPLABSize=16 -XX:OldSize=1410727936 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:-TraceClassUnloading -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseFastUnorderedTimeStamps -XX:+UseParNewGC
Command line:  -Xmx6g -Djava.library.path=/home/hadoop/hadoop/lib/native -Djavax.security.auth.useSubjectCredsOnly=false -Dcom.sun.management.jmxremote.port=9021 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -XX:MaxPermSize=1024m -XX:PermSize=256m -XX:MaxDirectMemorySize=8192m -XX:-TraceClassUnloading -XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:+PrintTenuringDistribution -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75 -Xnoclassgc -XX:+PrintGCDetails -XX:+PrintGCDateStamps Since I am not a professor in VM, hope for some help






  










angers.zhu




angers@gmail.com








签名由
网易邮箱大师
定制

[DISCUSS] Supporting hive on DataSourceV2

2020-03-23 Thread JackyLee

Hi devs,
I’d like to start a discussion about Supporting Hive on DatasourceV2. We’re
now working on a project using DataSourceV2 to provide multiple source
support and it works with the data lake solution very well, yet it does not
yet support HiveTable.

There are 3 reasons why we need to support Hive on DataSourceV2.
1. Hive itself is one of Spark data sources.
2. HiveTable is essentially a FileTable with its own input and output
formats, it works fine with FileTable.
3. HiveTable should be stateless, and users can freely read or write Hive
using batch or microbatch.

We implemented stateless Hive on DataSourceV1, it supports user to write
into Hive on streaming or batch and it has widely used in our company.
Recently, we are trying to support Hive on DataSourceV2, Multiple Hive
Catalog and DDL Commands have already been supported. 

Looking forward to more discussions on this.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [DISCUSS] Supporting hive on DataSourceV2

Re: \r\n in csv output

Re: \r\n in csv output

\r\n in csv output

Re: Spark Thrift Server java vm problem need help

Re: Spark Thrift Server java vm problem need help

Re: Spark Thrift Server java vm problem need help

Re: Spark Thrift Server java vm problem need help

Re: Spark Thrift Server java vm problem need help

Partition by Custom Wrapping

Spark Thrift Server java vm problem need help

[DISCUSS] Supporting hive on DataSourceV2

12 matches

Site Navigation

Mail list logo

Footer information