Agreed with delta.io, I am exploring both options On Wed, May 1, 2019 at 2:50 PM Vitaliy Pisarev <vitaliy.pisa...@biocatch.com> wrote:
> Ankit, you should take a look at delta.io that was recently open sourced > by databricks. > > Full DML support is on the way. > > > > *From: *"Khare, Ankit" <ankit.kh...@eon.com> > *Date: *Tuesday, 23 April 2019 at 11:35 > *To: *Chetan Khatri <chetan.opensou...@gmail.com>, Jason Nerothin < > jasonnerot...@gmail.com> > *Cc: *user <user@spark.apache.org> > *Subject: *Re: Update / Delete records in Parquet > > > > Hi Chetan, > > > > I also agree that for this usecase parquet would not be the best option . > I had similar usecase , > > > > 50 different tables to be download from MSSQL . > > > > Source : MSSQL > > Destination. : Apache KUDU (Since it supports very well change data > capture use cases) > > > > We used Streamset CDC module to connect to MSSQL and then get CDC data to > Apache KUDU > > > > Total records. : 3 B > > > > Thanks > > Ankit > > > > > > *From: *Chetan Khatri <chetan.opensou...@gmail.com> > *Date: *Tuesday, 23. April 2019 at 05:58 > *To: *Jason Nerothin <jasonnerot...@gmail.com> > *Cc: *user <user@spark.apache.org> > *Subject: *Re: Update / Delete records in Parquet > > > > Hello Jason, Thank you for reply. My use case is that, first time I do > full load and transformation/aggregation/joins and write to parquet (as > staging) but next time onwards my source is MSSQL Server, I want to pull > only those records got changed / updated and would like to update at > parquet also if possible without side effects. > > > https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/work-with-change-tracking-sql-server?view=sql-server-2017 > > > > On Tue, Apr 23, 2019 at 3:02 AM Jason Nerothin <jasonnerot...@gmail.com> > wrote: > > Hi Chetan, > > > > Do you have to use Parquet? > > > > It just feels like it might be the wrong sink for a high-frequency change > scenario. > > > > What are you trying to accomplish? > > > > Thanks, > Jason > > > > On Mon, Apr 22, 2019 at 2:09 PM Chetan Khatri <chetan.opensou...@gmail.com> > wrote: > > Hello All, > > > > If I am doing incremental load / delta and would like to update / delete > the records in parquet, I understands that parquet is immutable and can't > be deleted / updated theoretically only append / overwrite can be done. But > I can see utility tools which claims to add value for that. > > > > https://github.com/Factual/parquet-rewriter > > > > Please throw a light. > > > > Thanks > > > > > -- > > Thanks, > > Jason > >