from:"Shubham Chaurasia"

Pre query execution hook for custom datasources

2020-09-18 Thread Shubham Chaurasia

Hi, In our custom datasource implementation, we want to inject some query level information. For example - scala> val df = spark.sql("some query") // uses custom datasource under the hood through Session Extensions. scala> df.count // here we want some kind of pre execution hook just before

Incorrect results in left_outer join in DSv2 implementation with filter pushdown - spark 2.3.2

2019-09-19 Thread Shubham Chaurasia

Hi, Consider the following statements: 1) > scala> val df = spark.read.format("com.shubham.MyDataSource").load > scala> df.show > +---+---+ > | i| j| > +---+---+ > | 0| 0| > | 1| -1| > | 2| -2| > | 3| -3| > | 4| -4| > +---+---+ > 2) > scala> val df1 = df.filter("i < 3") > scala> df1.show

DataSourceV2: pushFilters() is not invoked for each read call - spark 2.3.2

2019-09-05 Thread Shubham Chaurasia

Hi, I am using spark v2.3.2. I have an implementation of DSV2. Here is what is happening: 1) Obtained a dataframe using MyDataSource scala> val df1 = spark.read.format("com.shubham.MyDataSource").load > MyDataSource.MyDataSource > MyDataSource.createReader: Going to create a new MyDataSourceRead

Re: Clean up method for DataSourceReader

2019-06-12 Thread Shubham Chaurasia

explored SparkListener#onJobEnd, but then how to propagate some state from DataSourceReader to SparkListener? On Wed, Jun 12, 2019 at 2:22 PM Shubham Chaurasia wrote: > Hi All, > > Is there any way to receive some event that a DataSourceReader is > finished? > I want to do some clean u

Clean up method for DataSourceReader

2019-06-12 Thread Shubham Chaurasia

Hi All, Is there any way to receive some event that a DataSourceReader is finished? I want to do some clean up after all the DataReaders are finished reading and hence need some kind of cleanUp() mechanism at DataSourceReader(Driver) level. How to achieve this? For instance, in DataSourceWriter

Re: Static partitioning in partitionBy()

2019-05-08 Thread Shubham Chaurasia

nt:* Tuesday, May 7, 2019 9:35:10 AM > *To:* Shubham Chaurasia > *Cc:* dev; user@spark.apache.org > *Subject:* Re: Static partitioning in partitionBy() > > It depends on the data source. Delta Lake (https://delta.io) allows you > to do it with the .option("replaceWhere",

Static partitioning in partitionBy()

2019-05-07 Thread Shubham Chaurasia

Hi All, Is there a way I can provide static partitions in partitionBy()? Like: df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save Above code gives following error as it tries to find column `c=c1` in df. org.apache.spark.sql.AnalysisException: Partition column `c=c1` not

Re: DataFrameWriter does not adjust spark.sql.session.timeZone offset while writing orc files

2019-04-24 Thread Shubham Chaurasia

pr 24, 2019 at 6:30 PM Shubham Chaurasia < > shubh.chaura...@gmail.com> wrote: > >> Hi All, >> >> Consider the following(spark v2.4.0): >> >> Basically I change values of `spark.sql.session.timeZone` and perform an >> orc write. Here are 3 samples:- >>

DataFrameWriter does not adjust spark.sql.session.timeZone offset while writing orc files

2019-04-24 Thread Shubham Chaurasia

Hi All, Consider the following(spark v2.4.0): Basically I change values of `spark.sql.session.timeZone` and perform an orc write. Here are 3 samples:- 1) scala> spark.conf.set("spark.sql.session.timeZone", "Asia/Kolkata") scala> val df = sc.parallelize(Seq("2019-04-23 09:15:04.0")).toDF("ts").w

Re: DataSourceV2 producing wrong date value in Custom Data Writer

2019-02-06 Thread Shubham Chaurasia

quot;internal" indicates that Spark > doesn't need to convert the values. > > Spark's internal representation for a date is the ordinal from the unix > epoch date, 1970-01-01 = 0. > > rb > > On Tue, Feb 5, 2019 at 4:46 AM Shubham Chaurasia < > shu

DataSourceV2 producing wrong date value in Custom Data Writer

2019-02-05 Thread Shubham Chaurasia

Hi All, I am using custom DataSourceV2 implementation (*Spark version 2.3.2*) Here is how I am trying to pass in *date type *from spark shell. scala> val df = > sc.parallelize(Seq("2019-02-05")).toDF("datetype").withColumn("datetype", > col("datetype").cast("date")) > scala> df.write.format("com

Re: DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state

2018-10-09 Thread Shubham Chaurasia

? > > > > When I do: > > Val df = spark.read.format(mypackage).load().show() > > I am getting a single creation, how are you creating the reader? > > > > Thanks, > > Assaf > > > > *From:* Shubham Chaurasia [mailto:shubh.chaura...@gmail.c

Re: DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state

2018-10-09 Thread Shubham Chaurasia

tried to reproduce it in my > environment and I am getting just one instance of the reader… > > > > Thanks, > > Assaf > > > > *From:* Shubham Chaurasia [mailto:shubh.chaura...@gmail.com] > *Sent:* Tuesday, October 9, 2018 9:31 AM > *To:* user@spark.apache.org >

DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state

2018-10-08 Thread Shubham Chaurasia

Hi All, --Spark built with *tags/v2.4.0-rc2* Consider following DataSourceReader implementation: public class MyDataSourceReader implements DataSourceReader, SupportsScanColumnarBatch { StructType schema = null; Map options; public MyDataSourceReader(Map options) { System.out.println

Target java version not set when building spark with tags/v2.4.0-rc2

2018-10-07 Thread Shubham Chaurasia

Hi All, I built spark with *tags/v2.4.0-rc2* using ./build/mvn -DskipTests -Phadoop-2.7 -Dhadoop.version=3.1.0 clean install Now from spark-shell when ever I call any static method residing in an interface, it shows me error like : :28: error: Static methods in interface require -target:jvm-1.8

Pre query execution hook for custom datasources

Incorrect results in left_outer join in DSv2 implementation with filter pushdown - spark 2.3.2

DataSourceV2: pushFilters() is not invoked for each read call - spark 2.3.2

Re: Clean up method for DataSourceReader

Clean up method for DataSourceReader

Re: Static partitioning in partitionBy()

Static partitioning in partitionBy()

Re: DataFrameWriter does not adjust spark.sql.session.timeZone offset while writing orc files

DataFrameWriter does not adjust spark.sql.session.timeZone offset while writing orc files

Re: DataSourceV2 producing wrong date value in Custom Data Writer

DataSourceV2 producing wrong date value in Custom Data Writer

Re: DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state

Re: DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state

DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state

Target java version not set when building spark with tags/v2.4.0-rc2

15 matches

Site Navigation

Mail list logo

Footer information