This seems to be a Cloudera environment issue, and you might get a faster
and more reliable answer in Cloudera forums.
On Fri, May 4, 2018 at 3:39 PM, Fawze Abujaber wrote:
> Hi Yulia,
>
> Thanks for you response.
>
> i see only lzo only for impala
>
> [root@xxx ~]# locate *lzo*.so*
> /opt/
Hi Yulia,
Thanks for you response.
i see only lzo only for impala
[root@xxx ~]# locate *lzo*.so*
/opt/cloudera/parcels/GPLEXTRAS-5.13.0-1.cdh5.13.0.p0.29/lib/impala/lib/libimpalalzo.so
/usr/lib64/liblzo2.so.2
/usr/lib64/liblzo2.so.2.0.0
the
/opt/cloudera/parcels/GPLEXTRAS-5.13.0-1.cdh5.13
Is this a recommended way of reading data in the long run? I think it might
be better to write or look for an InputFormat which supports the need
Btw Block is designed to be hdfs internal representation to enable certain
features. It would be interesting to understand the usecase where client
app
Jar is not enough, you need native library (*.so) - see if your "native"
directory contains it
drwxr-xr-x 2 cloudera-scm cloudera-scm 4096 Oct 4 2017 native
and whether java.library.path or LD_LIBRARY_PATH points/includes directory
where your *.so library resides
On Thursday, May 3, 201
Yes, you can usually use a broadcast join to avoid skew problems.
On Wed, May 2, 2018 at 8:57 PM, Pralabh Kumar
wrote:
> I am performing join operation , if I convert reduce side join to map side
> (no shuffle will happen) and I assume in that case this error shouldn't
> come. Let me know if th
Version: 2.3, DataSourceV2, ContinuousReader
Hi,
We're creating a new data source to fetch data from streaming source that
requires commiting received data and we would like to commit data once in a
while after it has been retrieved and correctly processed and then fetch
more.
One option could b
I think you need to group by a window (tumbling) and define watermarks (put a
very low watermark or even 0) to discard the state. Here the window duration
becomes your logical batch.
- Arun
From: kant kodali
Date: Thursday, May 3, 2018 at 1:52 AM
To: "user @spark"
Subject: Re: question on
Hello Madhav,
What I did is pretty straight-forward. Let's say that your HDFS block
is 128 MB and you store a file of 256 MBs in HDFS, named Test.csv.
First use the command: `hdfs fsck Test.csv -locations -blocks -files`.
It will return you some very useful information including the list of
blocks.
I would like to create a Spark UDF which returns the a prediction made with a
trained Keras model. Keras models are not typically pickle-able, however I
have used the monkey patch approach to making Keras models pickle-able, as
described here: http://zachmoshe.com/2017/04/03/pickling-keras-models.h
Hi Sergey,
Thanks for your valuable feedback!
For 1: yea this is definitely a bug and I have sent a PR to fix it.
For 2: I have left my comments on the JIRA ticket.
For 3: I don't quite understand it, can you give some concrete examples?
For 4: yea this is a problem, but I think it's not a big de
Hi Guys,
I'm running into issue where my spark jobs are failing on the below error,
I'm using Spark 1.6.0 with CDH 5.13.0.
I tried to figure it out with no success.
Will appreciate any help or a direction how to attack this issue.
User class threw exception: org.apache.spark.SparkException: Job
After doing some more research using Google. It's clear that aggregations
by default are stateful in Structured Streaming. so the question now is how
to do stateless aggregations(not storing the result from previous batches)
using Structured Streaming 2.3.0? I am trying to do it using raw spark SQL
Hi All,
I was under an assumption that one needs to run grouby(window(...)) to run
any stateful operations but looks like that is not the case since any
aggregation like query
"select count(*) from some_view" is also stateful since it stores the
result of the count from the previous batch. Likew
13 matches
Mail list logo