date:20230304

Re: Unable to handle bignumeric datatype in spark/pyspark

2023-03-04 Thread Atheeth SH

Hi Rajnil,

Sorry for the multiple emails. It seems you are getting the
ModuleNotFoundError error was curious, have you tried using the
below-mentioned solution mentioned in the readme file?

Below is the link:-
https://github.com/GoogleCloudDataproc/spark-bigquery-connector#bignumeric-support

Also please find the code block solution.

if the code throws ModuleNotFoundError, please add the following code
before reading the BigNumeric data.

try:
import pkg_resources

pkg_resources.declare_namespace(__name__)
except ImportError:
import pkgutil

__path__ = pkgutil.extend_path(__path__, __name__)

Thanks,

Atheeth


On Fri, 3 Mar 2023 at 16:25, Atheeth SH  wrote:

> Hi Rajnil,
>
> Just curious, what version of spark-bigquery-connector are you using?
>
> Thanks,
> Atheeth
>
> On Sat, 25 Feb 2023 at 23:48, Mich Talebzadeh 
> wrote:
>
>> sounds like it is cosmetric. The important point is that if the data
>> stored in GBQ is valid?
>>
>>
>> THT
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sat, 25 Feb 2023 at 18:12, Rajnil Guha 
>> wrote:
>>
>>> Hi All,
>>>
>>> I had created an issue on Stackoverflow(linked below) a few months back
>>> about issues while handling bignumeric type values of BigQuery in Spark.
>>>
>>> link
>>> 
>>>
>>> On Fri, Feb 24, 2023 at 3:54 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Hi Nidhi,

 can you create a BigQuery table with a  bignumeric and numeric column
 types, add a few lines and try to read into spark. through DF

 and do


 df.printSchema()

 df.show(5,False)


 HTH


view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Fri, 24 Feb 2023 at 02:47, nidhi kher  wrote:

> Hello,
>
>
> I am facing below issue in pyspark code:
>
> We are running spark code using dataproc serverless batch in google
> cloud platform. Spark code is causing issue while writing the data to
> bigquery table. In bigquery table , few of the columns have datatype as
> bignumeric and spark code is changing the datatype from bignumeric to
> numeric while writing the data. We need datatype to be kept as bignumeric
> only as we need data of 38,20 precision.
>
>
> Can we cast a column to bignumeric in spark sql dataframe like below
> code for decimal:
>
>
> df= spark.sql("""SELECT cast(col1 as decimal(38,20)) as col1 from
> table1""")
>
> Spark version :3.3
>
> Pyspark version : 1.1
>
>
> Regards,
>
> Nidhi
>

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh

This might help

https://docs.databricks.com/structured-streaming/foreach.html

streamingDF.writeStream.foreachBatch(...) allows you to specify a function
that is executed on the output data of every micro-batch of the streaming
query. It takes two parameters: a DataFrame or Dataset that has the output
data of a micro-batch and the unique ID of the micro-batch


So there are two different function calls in my case. I cannot put them
together in one function.


   newtopicResult = streamingNewtopic.select( \

 col("newtopic_value.uuid").alias("uuid") \

   , col("newtopic_value.timeissued").alias("timeissued") \

   , col("newtopic_value.queue").alias("queue") \

   , col("newtopic_value.status").alias("status")). \

 writeStream. \

 outputMode('append'). \

 option("truncate", "false"). \

 *foreachBatch(sendToControl). \*

* trigger(processingTime='30 seconds'). \*

* option('checkpointLocation',
checkpoint_path_newtopic). \*

* queryName(config['MDVariables']['newtopic']). \*

* start()*

#print(newtopicResult)


result = streamingDataFrame.select( \

 col("parsed_value.rowkey").alias("rowkey") \

   , col("parsed_value.ticker").alias("ticker") \

   , col("parsed_value.timeissued").alias("timeissued") \

   , col("parsed_value.price").alias("price")). \

 writeStream. \

 outputMode('append'). \

 option("truncate", "false"). \

  *   foreachBatch(sendToSink). \*

* trigger(processingTime='30 seconds'). \*

* option('checkpointLocation', checkpoint_path). \*

* queryName(config['MDVariables']['topic']). \*

 start()




   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 4 Mar 2023 at 21:51, Mich Talebzadeh 
wrote:

> I am aware of your point that global  don't work in a distributed
> environment.
> With regard to your other point, these are two different topics with their
> own streams. The point of second stream is to set the status to false, so
> it can gracefully shutdown the main stream (the one called "md") here
>
> For example, the second stream has this row
>
>
> ++---+-+--+
>
> |uuid|timeissued |queue|status|
>
> ++---+-+--+
>
> |ac74d419-58aa-4879-945d-a2a41bb64873|2023-03-04 21:29:18|md   |true  |
>
> ++---+-+--+
>
> so every 30 seconds, it checks the status and if staus = false, it shuts
> down the main stream gracefully. It works ok
>
> def sendToControl(dfnewtopic, batchId2):
> if(len(dfnewtopic.take(1))) > 0:
> print(f"""From sendToControl, newtopic batchId is {batchId2}""")
> dfnewtopic.show(100,False)
> queue = dfnewtopic.first()[2]
> status = dfnewtopic.first()[3]
> print(f"""testing queue is {queue}, and status is {status}""")
> if((queue == config['MDVariables']['topic']) & (status == 'false')
> ):
>   spark_session = s.spark_session(config['common']['appName'])
>   active = spark_session.streams.active
>   for e in active:
>  name = e.name
>  if(name == config['MDVariables']['topic']):
> print(f"""\n==> Request terminating streaming process for
> topic {name} at {datetime.now()}\n """)
> e.stop()
> else:
> print("DataFrame newtopic is empty")
>
> and so when status set to false in the second it does as below
>
> From sendToControl, newtopic batchId is 93
> ++---+-+--+
> |uuid|timeissued |queue|status|
> ++---+-+--+
> |c4736bc7-bee7-4dce-b67a-3b1d674b243a|2023-03-04 21:36:52|md   |false |
> ++---+-+--+
>
> *testing queue is md, and status is false*
>
> ==> Request terminating streaming process for topic md at 2023-03-04
> 21:36:55.590162
>
> and shuts down
>
> I want to state this
>
>   print(f"""\n==> Request termi

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh

I am aware of your point that global  don't work in a distributed
environment.
With regard to your other point, these are two different topics with their
own streams. The point of second stream is to set the status to false, so
it can gracefully shutdown the main stream (the one called "md") here

For example, the second stream has this row


++---+-+--+

|uuid|timeissued |queue|status|

++---+-+--+

|ac74d419-58aa-4879-945d-a2a41bb64873|2023-03-04 21:29:18|md   |true  |

++---+-+--+

so every 30 seconds, it checks the status and if staus = false, it shuts
down the main stream gracefully. It works ok

def sendToControl(dfnewtopic, batchId2):
if(len(dfnewtopic.take(1))) > 0:
print(f"""From sendToControl, newtopic batchId is {batchId2}""")
dfnewtopic.show(100,False)
queue = dfnewtopic.first()[2]
status = dfnewtopic.first()[3]
print(f"""testing queue is {queue}, and status is {status}""")
if((queue == config['MDVariables']['topic']) & (status == 'false')):
  spark_session = s.spark_session(config['common']['appName'])
  active = spark_session.streams.active
  for e in active:
 name = e.name
 if(name == config['MDVariables']['topic']):
print(f"""\n==> Request terminating streaming process for
topic {name} at {datetime.now()}\n """)
e.stop()
else:
print("DataFrame newtopic is empty")

and so when status set to false in the second it does as below

>From sendToControl, newtopic batchId is 93
++---+-+--+
|uuid|timeissued |queue|status|
++---+-+--+
|c4736bc7-bee7-4dce-b67a-3b1d674b243a|2023-03-04 21:36:52|md   |false |
++---+-+--+

*testing queue is md, and status is false*

==> Request terminating streaming process for topic md at 2023-03-04
21:36:55.590162

and shuts down

I want to state this

  print(f"""\n==> Request terminating streaming process for topic {name}
and batch {BatchId for md} at {datetime.now()}\n """)

That {BatchId for md} should come from this one

def sendToSink(df, batchId):
if(len(df.take(1))) > 0:
print(f"""From sendToSink, md, batchId is {batchId}, at
{datetime.now()} """)
#df.show(100,False)
df. persist()
# write to BigQuery batch table
#s.writeTableToBQ(df, "append",
config['MDVariables']['targetDataset'],config['MDVariables']['targetTable'])
df.unpersist()
#print(f"""wrote to DB""")
batchidMD = batchId
print(batchidMD)
else:
print("DataFrame md is empty")

I trust I explained it adequately

cheers


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 4 Mar 2023 at 21:22, Sean Owen  wrote:

> I don't quite get it - aren't you applying to the same stream, and
> batches? worst case why not apply these as one function?
> Otherwise, how do you mean to associate one call to another?
> globals don't help here. They aren't global beyond the driver, and, which
> one would be which batch?
>
> On Sat, Mar 4, 2023 at 3:02 PM Mich Talebzadeh 
> wrote:
>
>> Thanks. they are different batchIds
>>
>> From sendToControl, newtopic batchId is 76
>> From sendToSink, md, batchId is 563
>>
>> As a matter of interest, why does a global variable not work?
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sat, 4 Mar 2023 at 20:13, Sean Owen  wrote:
>>
>>> It's the same batch ID already, no?
>>> Or why not simply put the logic of both in one function? or write one
>>> function that calls both?
>>>
>>> On Sat, Mar 4, 2023 at 2:07 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>

 This is probably pretty  straight forward but somehow is does

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Sean Owen

I don't quite get it - aren't you applying to the same stream, and batches?
worst case why not apply these as one function?
Otherwise, how do you mean to associate one call to another?
globals don't help here. They aren't global beyond the driver, and, which
one would be which batch?

On Sat, Mar 4, 2023 at 3:02 PM Mich Talebzadeh 
wrote:

> Thanks. they are different batchIds
>
> From sendToControl, newtopic batchId is 76
> From sendToSink, md, batchId is 563
>
> As a matter of interest, why does a global variable not work?
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 4 Mar 2023 at 20:13, Sean Owen  wrote:
>
>> It's the same batch ID already, no?
>> Or why not simply put the logic of both in one function? or write one
>> function that calls both?
>>
>> On Sat, Mar 4, 2023 at 2:07 PM Mich Talebzadeh 
>> wrote:
>>
>>>
>>> This is probably pretty  straight forward but somehow is does not look
>>> that way
>>>
>>>
>>>
>>> On Spark Structured Streaming,  "foreachBatch" performs custom write
>>> logic on each micro-batch through a call function. Example,
>>>
>>> foreachBatch(sendToSink) expects 2 parameters, first: micro-batch as
>>> DataFrame or Dataset and second: unique id for each batch
>>>
>>>
>>>
>>> In my case I simultaneously read two topics through two separate
>>> functions
>>>
>>>
>>>
>>>1. foreachBatch(sendToSink). \
>>>2. foreachBatch(sendToControl). \
>>>
>>> This is  the code
>>>
>>> def sendToSink(df, batchId):
>>> if(len(df.take(1))) > 0:
>>> print(f"""From sendToSink, md, batchId is {batchId}, at
>>> {datetime.now()} """)
>>> #df.show(100,False)
>>> df. persist()
>>> # write to BigQuery batch table
>>> #s.writeTableToBQ(df, "append",
>>> config['MDVariables']['targetDataset'],config['MDVariables']['targetTable'])
>>> df.unpersist()
>>> #print(f"""wrote to DB""")
>>>else:
>>> print("DataFrame md is empty")
>>>
>>> def sendToControl(dfnewtopic, batchId2):
>>> if(len(dfnewtopic.take(1))) > 0:
>>> print(f"""From sendToControl, newtopic batchId is {batchId2}""")
>>> dfnewtopic.show(100,False)
>>> queue = dfnewtopic.first()[2]
>>> status = dfnewtopic.first()[3]
>>> print(f"""testing queue is {queue}, and status is {status}""")
>>> if((queue == config['MDVariables']['topic']) & (status ==
>>> 'false')):
>>>   spark_session = s.spark_session(config['common']['appName'])
>>>   active = spark_session.streams.active
>>>   for e in active:
>>>  name = e.name
>>>  if(name == config['MDVariables']['topic']):
>>> print(f"""\n==> Request terminating streaming process
>>> for topic {name} at {datetime.now()}\n """)
>>> e.stop()
>>> else:
>>> print("DataFrame newtopic is empty")
>>>
>>>
>>> The problem I have is to share batchID from the first function in the
>>> second function sendToControl(dfnewtopic, batchId2) so I can print it
>>> out.
>>>
>>>
>>> Defining a global did not work.. So it sounds like I am missing
>>> something rudimentary here!
>>>
>>>
>>> Thanks
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh

Thanks. they are different batchIds

>From sendToControl, newtopic batchId is 76
>From sendToSink, md, batchId is 563

As a matter of interest, why does a global variable not work?



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 4 Mar 2023 at 20:13, Sean Owen  wrote:

> It's the same batch ID already, no?
> Or why not simply put the logic of both in one function? or write one
> function that calls both?
>
> On Sat, Mar 4, 2023 at 2:07 PM Mich Talebzadeh 
> wrote:
>
>>
>> This is probably pretty  straight forward but somehow is does not look
>> that way
>>
>>
>>
>> On Spark Structured Streaming,  "foreachBatch" performs custom write
>> logic on each micro-batch through a call function. Example,
>>
>> foreachBatch(sendToSink) expects 2 parameters, first: micro-batch as
>> DataFrame or Dataset and second: unique id for each batch
>>
>>
>>
>> In my case I simultaneously read two topics through two separate functions
>>
>>
>>
>>1. foreachBatch(sendToSink). \
>>2. foreachBatch(sendToControl). \
>>
>> This is  the code
>>
>> def sendToSink(df, batchId):
>> if(len(df.take(1))) > 0:
>> print(f"""From sendToSink, md, batchId is {batchId}, at
>> {datetime.now()} """)
>> #df.show(100,False)
>> df. persist()
>> # write to BigQuery batch table
>> #s.writeTableToBQ(df, "append",
>> config['MDVariables']['targetDataset'],config['MDVariables']['targetTable'])
>> df.unpersist()
>> #print(f"""wrote to DB""")
>>else:
>> print("DataFrame md is empty")
>>
>> def sendToControl(dfnewtopic, batchId2):
>> if(len(dfnewtopic.take(1))) > 0:
>> print(f"""From sendToControl, newtopic batchId is {batchId2}""")
>> dfnewtopic.show(100,False)
>> queue = dfnewtopic.first()[2]
>> status = dfnewtopic.first()[3]
>> print(f"""testing queue is {queue}, and status is {status}""")
>> if((queue == config['MDVariables']['topic']) & (status ==
>> 'false')):
>>   spark_session = s.spark_session(config['common']['appName'])
>>   active = spark_session.streams.active
>>   for e in active:
>>  name = e.name
>>  if(name == config['MDVariables']['topic']):
>> print(f"""\n==> Request terminating streaming process for
>> topic {name} at {datetime.now()}\n """)
>> e.stop()
>> else:
>> print("DataFrame newtopic is empty")
>>
>>
>> The problem I have is to share batchID from the first function in the
>> second function sendToControl(dfnewtopic, batchId2) so I can print it
>> out.
>>
>>
>> Defining a global did not work.. So it sounds like I am missing something
>> rudimentary here!
>>
>>
>> Thanks
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Sean Owen

It's the same batch ID already, no?
Or why not simply put the logic of both in one function? or write one
function that calls both?

On Sat, Mar 4, 2023 at 2:07 PM Mich Talebzadeh 
wrote:

>
> This is probably pretty  straight forward but somehow is does not look
> that way
>
>
>
> On Spark Structured Streaming,  "foreachBatch" performs custom write logic
> on each micro-batch through a call function. Example,
>
> foreachBatch(sendToSink) expects 2 parameters, first: micro-batch as
> DataFrame or Dataset and second: unique id for each batch
>
>
>
> In my case I simultaneously read two topics through two separate functions
>
>
>
>1. foreachBatch(sendToSink). \
>2. foreachBatch(sendToControl). \
>
> This is  the code
>
> def sendToSink(df, batchId):
> if(len(df.take(1))) > 0:
> print(f"""From sendToSink, md, batchId is {batchId}, at
> {datetime.now()} """)
> #df.show(100,False)
> df. persist()
> # write to BigQuery batch table
> #s.writeTableToBQ(df, "append",
> config['MDVariables']['targetDataset'],config['MDVariables']['targetTable'])
> df.unpersist()
> #print(f"""wrote to DB""")
>else:
> print("DataFrame md is empty")
>
> def sendToControl(dfnewtopic, batchId2):
> if(len(dfnewtopic.take(1))) > 0:
> print(f"""From sendToControl, newtopic batchId is {batchId2}""")
> dfnewtopic.show(100,False)
> queue = dfnewtopic.first()[2]
> status = dfnewtopic.first()[3]
> print(f"""testing queue is {queue}, and status is {status}""")
> if((queue == config['MDVariables']['topic']) & (status ==
> 'false')):
>   spark_session = s.spark_session(config['common']['appName'])
>   active = spark_session.streams.active
>   for e in active:
>  name = e.name
>  if(name == config['MDVariables']['topic']):
> print(f"""\n==> Request terminating streaming process for
> topic {name} at {datetime.now()}\n """)
> e.stop()
> else:
> print("DataFrame newtopic is empty")
>
>
> The problem I have is to share batchID from the first function in the
> second function sendToControl(dfnewtopic, batchId2) so I can print it
> out.
>
>
> Defining a global did not work.. So it sounds like I am missing something
> rudimentary here!
>
>
> Thanks
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh

This is probably pretty  straight forward but somehow is does not look that
way



On Spark Structured Streaming,  "foreachBatch" performs custom write logic
on each micro-batch through a call function. Example,

foreachBatch(sendToSink) expects 2 parameters, first: micro-batch as
DataFrame or Dataset and second: unique id for each batch



In my case I simultaneously read two topics through two separate functions



   1. foreachBatch(sendToSink). \
   2. foreachBatch(sendToControl). \

This is  the code

def sendToSink(df, batchId):
if(len(df.take(1))) > 0:
print(f"""From sendToSink, md, batchId is {batchId}, at
{datetime.now()} """)
#df.show(100,False)
df. persist()
# write to BigQuery batch table
#s.writeTableToBQ(df, "append",
config['MDVariables']['targetDataset'],config['MDVariables']['targetTable'])
df.unpersist()
#print(f"""wrote to DB""")
   else:
print("DataFrame md is empty")

def sendToControl(dfnewtopic, batchId2):
if(len(dfnewtopic.take(1))) > 0:
print(f"""From sendToControl, newtopic batchId is {batchId2}""")
dfnewtopic.show(100,False)
queue = dfnewtopic.first()[2]
status = dfnewtopic.first()[3]
print(f"""testing queue is {queue}, and status is {status}""")
if((queue == config['MDVariables']['topic']) & (status == 'false')):
  spark_session = s.spark_session(config['common']['appName'])
  active = spark_session.streams.active
  for e in active:
 name = e.name
 if(name == config['MDVariables']['topic']):
print(f"""\n==> Request terminating streaming process for
topic {name} at {datetime.now()}\n """)
e.stop()
else:
print("DataFrame newtopic is empty")


The problem I have is to share batchID from the first function in the
second function sendToControl(dfnewtopic, batchId2) so I can print it out.


Defining a global did not work.. So it sounds like I am missing something
rudimentary here!


Thanks


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: SPIP architecture diagrams

2023-03-04 Thread Mich Talebzadeh

ok I decided to bite the bullet and use a Visio diagram for my SPIP "Shutting
down spark structured streaming when the streaming process completed the
current process". Details from here
https://issues.apache.org/jira/browse/SPARK-42485


This is not meant to be complete. In this an indication. I have tried to
make it generic. However, trademarks are acknowledged . I have tried not to
use color but I guess pointers are fair.


Let me know your thoughts.


Regards



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 24 Feb 2023 at 20:12, Mich Talebzadeh 
wrote:

>
> Sounds like I have to decide for myself what to use. A correction  Vision
> should read* Visio *
>
>
> ideally the SPIP guide https://spark.apache.org/improvement-proposals.html
>  should include this topic. Additionally there should be a repository for
> the original diagrams as well. From the said guide:
>
>
> *Appendix B. Optional Design Sketch: How are the goals going to be
> accomplished? Give sufficient technical detail to allow a contributor to
> judge whether it’s likely to be feasible. Note that this is not a full
> design document.*
>
> *Appendix C. Optional Rejected Designs: What alternatives were considered?
> Why were they rejected? If no alternatives have been considered, the
> problem needs more thought.*
>
>
> HTH
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 20 Feb 2023 at 15:11, Mich Talebzadeh 
> wrote:
>
>> Hi,
>>
>> Can someone advise me what architecture tools I can use to create
>> diagrams for SPIP document purposes?
>>
>>
>> For example, Vision, Excalidraw, Draw IO etc or does it matter as I just
>> need to create a PNG file from whatever?
>>
>>
>> Thanks
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Unable to handle bignumeric datatype in spark/pyspark

Re: How to pass variables across functions in spark structured streaming (PySpark)

Re: How to pass variables across functions in spark structured streaming (PySpark)

Re: How to pass variables across functions in spark structured streaming (PySpark)

Re: How to pass variables across functions in spark structured streaming (PySpark)

Re: How to pass variables across functions in spark structured streaming (PySpark)

How to pass variables across functions in spark structured streaming (PySpark)

Re: SPIP architecture diagrams

8 matches

Site Navigation

Mail list logo

Footer information