Hi,
When using spark.sql() to perform alter table operations I found that spark
changes the table owner property to the execution user. Then I digged into
the source code and found that in HiveClientImpl, the alterTable function
will set the owner of table to the current execution user. Besides,
This does not look like Spark error. Looks like yarn has not been able to
allocate resources for spark driver. If you check resource manager UI you
are likely to see this as spark application waiting for resources. Try
reducing the driver node memory and/ or other bottlenecks based on what you
see
You have understood the problem right. However note that your
interpretation of the output *(K, leftValue, null), **(K, leftValue,
rightValue1), **(K, leftValue, rightValue2)* is subject to the knowledge of
the semantics of the join. That if you are processing the output rows
*manually*, you are
Looks like you either have a misconfigured HDFS service, or you're
using the wrong configuration on the client.
BTW, as I said in the previous response, the message you saw initially
is *not* an error. If you're just trying things out, you don't need to
do anything and Spark should still work.
Hi,
I read that doc several times now. I am stuck with the below error message
when I run ./spark-shell --master yarn --deploy-mode client.
I have my HADOOP_CONF_DIR set to /usr/local/hadoop-2.7.3/etc/hadoop and
SPARK_HOME set to /usr/local/spark on all 3 machines (1 node for Resource
Manager
That's not an error, just a warning. The docs [1] have more info about
the config options mentioned in that message.
[1] http://spark.apache.org/docs/latest/running-on-yarn.html
On Mon, Mar 12, 2018 at 4:42 PM, kant kodali wrote:
> Hi All,
>
> I am trying to use YARN for the
Hi All,
I am trying to use YARN for the very first time. I believe I configured all
the resource manager and name node fine. And then I run the below command
./spark-shell --master yarn --deploy-mode client
*I get the below output and it hangs there forever *(I had been waiting
over 10 minutes)
I believe jmap is only showing you the java heap used, but the program is
running out of direct memory space. They are two different pools of memory.
I haven't had to diagnose a direct memory problem before, but this blog
post has some suggestions of how to do it:
Hello,
I am running a streaming app on Spark 2.1.2. The batch interval is set to
5000ms, and when I go to the "Streaming" tab in the Spark UI, it correctly
reports a 5 second batch interval, but the list of batches below only shows one
batch every two minutes (IE the batch time for each batch
Hi All,
This is Li Jin. We (me and my fellow colleagues at Two Sigma) have been
using Spark for time series analysis for the past two years and it has been
a success to scale up our time series analysis.
Recently, we start a conversation with Reynold about potential
opportunities to collaborate
Thanks Yinan,
I’m able to get kube-dns endpoints when I ran this command
kubectl get ep kube-dns —namespace=kube-system
Do I need to deploy under kube-system instead of default namespace
And please lemme know if you have any insights on Error1 ?
On Sun, Mar 11, 2018 at 8:26 PM Yinan Li
I think I understand that in the second case the DataFrame is created as a
Local object, so it lives in the memory of the driver and is serialized as
part of the Task that gets sent to each executor.
Though I think the implicit conversion here is something that others could
also misunderstand -
Hi, Using Scala, spark version 2.3.0 (also 2.2.0):I've come across two main
ways to create a DataFrame from a sequence. The more
common:(0 until
10).toDF("value") *good*and the less common (but still
prevalent):(0 until 10).toDF("value")*bad*The latter
results in much worse performance
13 matches
Mail list logo