h or can the records also
> be split into multiple batches?
>
>
> Best,
>
> Rico.
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
> --
Roland Johann
Data Architect/Data Engineer
phenetic
unsubscribe
signature.asc
Description: Message signed with OpenPGP
unsubscribe--
Roland Johann
Data Architect/Data Engineer
phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany
Mobil: +49 172 365 26 46
Mail: roland.joh...@phenetic.io
Web: phenetic.io
Handelsregister: Amtsgericht Köln (HRB 92595)
Geschäftsführer: Roland Johann, Uwe Reimann
t;
>>> var atrb = ListBuffer[(String,String,String)]()
>>>
>>> for((key,value) <- aMap){
>>> atrb += ((key, value._1, value._2))
>>> }
>>>
>>> var newCol = atrb.head.productIterator.toList.toSeq
>>>
>>> Please someone help me
Hi all,
don’t want to interrupt the conversation but are keen where I can find
information regarding dynamic allocation on kubernetes. As far as I know the
docs just point to future work.
Thanks a lot,
Roland
> Am 12.05.2020 um 09:25 schrieb Steven Stetzler :
>
> Hi all,
>
> I am
t; Software Developer IV
> Customer Knowledge Platform
> From: Roland Johann
> Sent: Thursday, April 30, 2020 8:30:05 AM
> To: randy clinton
> Cc: Roland Johann ; user
>
> Subject: Re: Left Join at SQL query gets planned as inner join
>
> Notice: This emai
lter(year = 2020 and month = 4 and day = 29)
> p_DF = p_DF.filter(year = 2020 and month = 4 and day = 29 and event_id is
> null)
>
> output = s_DF.join(p_DF, event_id == source_event_id, left)
>
>
>
> On Thu, Apr 30, 2020 at 11:06 AM Roland Johann
> wrote:
> Hi All,
la dsl lead to the
same execution plan. Can someone point to docs about the internals of this
topic of spark? The official docs about SQL in general are not that verbose.
Thanks in advance and stay safe!
Roland Johann
'somefile'))
>> lines = spark.sparkContext.textFile("log_file")
>> converted_lines_rdd = lines.map(lambda l: process_logline(l, tree_val))
>> log_line_rdd = spark.createDataFrame(converted_lines_rdd)
>> log_line_rdd.show()
>>
>> Basically
Hi Adnan,
coalescing involves network shuffle to other executors. How many executors are
configured for that job?
Best regards
Roland Johann
Software Developer/Data Engineer
phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany
Mobil: +49 172 365 26 46
Mail: roland.joh...@phenetic.io
Web
ent?
>4. Since the pipeline is going to run into Kubernetes, I am trying to
>avoid InfluxDB as time-series database and moving with prometheus. Is this
>approach correct?
>
> Thanks,
> Ani
> ---
> ᐧ
>
--
Roland Johann
Software Developer/Data Engineer
p
Hi All,
changing maxOffsetsPerTrigger and restarting the job won’t apply to the batch
size. This is somehow bad as we currently use a trigger duration of 5minutes
which consumes only 100k messages with an offset lag in the billions.
Decreasing trigger duration affects also micro batch size -
If the dataset contains a column like changed_at/created_at you can use this as
watermark and filter out rows that have changed_at/created_at before the
watermark.
Best Regards
Roland Johann
Software Developer/Data Engineer
phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany
Mobil: +49
itten
> ? I have checked in the yarn logs but couldn't find the messages I have
> written in the java file.
> Request your help please as I am little confused and know that there is
> something very silly which I am missing.
>
> Thanks in advance !
>
> Debu
>
-
e default security groups, ran my job again but the same
> exception pops up :-( ...
> All traffic is open on the security groups now.
>
> Jochen
>
> Op vr 4 okt. 2019 om 17:37 schreef Roland Johann <
> roland.joh...@phenetic.io>:
>
>> This are dynamic port ranges an
;
> We have indeed custom security groups. Can you tell me where exactly I
> need to be able to access what?
> For example, is it from the master instance to the driver instance? And
> which port should be open?
>
> Jochen
>
> Op vr 4 okt. 2019 om 17:14 schreef Roland Johann
.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
>>> {code}
>>>
>>> It actually goes wrong at this line:
>>> https://github.com/ap
I want to add that the major hadoop distributions also offer additional
encryption possibilities (for example Ranger from Hortonworks)
Roland Johann
Software Developer/Data Engineer
phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany
Mobil: +49 172 365 26 46
Mail: roland.joh...@phenetic.io
tps://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html>
- obviously if you don’t have to use PGP. Using encryption at the storage
layer simplifies your application and architecture and you don’t need to
reinvent the wheel.
Kind Regards
Roland Johann
Software Dev
ng you use hadoop 2.7.7.
Best Regards
Roland Johann
Software Developer/Data Engineer
phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany
Mobil: +49 172 365 26 46
Mail: roland.joh...@phenetic.io
Web: phenetic.io
Handelsregister: Amtsgericht Köln (HRB 92595)
Geschäftsführer: Roland Johann, Uwe R
Hi Krishna,
there seems to be no attachment.
In addition, you should NEVER post private credentials to public forums. Please
renew the credentials of your storage account as soon as possible!
Best Regards
Roland Johann
Software Developer/Data Engineer
phenetic GmbH
Lütticher Straße 10, 50674
21 matches
Mail list logo