Matthias,
Can you check appending the jars in LAUNCH_CLASSPATH of
spark-1.4.1/sbin/spark_class
2016-03-02 21:39 GMT+05:30 Matthias Niehoff :
> no, not to driver and executor but to the master and worker instances of
> the spark standalone cluster
>
> Am 2. März 2016 um 17:05 schrieb Igor Be
Hi All,
I am trying to add DEBUG for Spark ApplicationMaster for it is not working.
On running Spark job, passed
-Dlog4j.configuration=file:/opt/mapr/spark/spark-1.4.1/conf/log4j.properties
The log4j.properties has log4j.rootCategory=DEBUG, console
Spark Executor Containers has DEBUG logs but
Is all NodeManager services restarted after the change in yarn-site.xml
On Thu, Mar 3, 2016 at 6:00 AM, Jeff Zhang wrote:
> The executor may fail to start. You need to check the executor logs, if
> there's no executor log then you need to check node manager log.
>
> On Wed, Mar 2, 2016 at 4:26 P
= {
val pieces = line.split(' ')
val level = pieces(2).toString
val one = pieces(0).toString
val two = pieces(1).toString
(level,LogClass(one,two))
}
val output = logData.map(x => parse(x))
*val partitioned = output.partitionBy(new ExactPartitioner(5)).persist()val
groups = partitioned.groupByKey(new ExactPartitioner(5))*
groups.count()
output.partitions.size
partitioned.partitions.size
}
}
Thanks,
Prabhu Joseph
Hi All,
What is the difference between Spark Partitioner and Spark Shuffle
Manager. Spark Partitioner is by default Hash partitioner and Spark shuffle
manager is sort based, others are Hash, Tunsten Sort.
Thanks,
Prabhu Joseph
shuffle files from an external service instead of
from each other which will offload the load on Spark Executors.
We want to check whether a similar thing of an External Service is
implemented for transferring the cached partition to other executors.
Thanks, Prabhu Joseph
cate hot cached blocks right?
>
>
> On Tuesday, March 8, 2016, Prabhu Joseph
> wrote:
>
>> Hi All,
>>
>> When a Spark Job is running, and one of the Spark Executor on Node A
>> has some partitions cached. Later for some other stage, Scheduler tries to
hanks,
Prabhu Joseph
On Fri, Mar 11, 2016 at 3:45 AM, Ashok Kumar
wrote:
>
> Hi,
>
> We intend to use 5 servers which will be utilized for building Bigdata
> Hadoop data warehouse system (not using any propriety distribution like
> Hortonworks or Cloudera or others).
> All server
Looking at ExternalSorter.scala line 192
189
while (records.hasNext) { addElementsRead() kv = records.next()
map.changeValue((getPartition(kv._1), kv._1), update)
maybeSpillCollection(usingMap = true) }
On Sat, Mar 12, 2016 at 12:31 PM, Saurabh Guru
wrote:
> I am seeing the following exception
Looking at ExternalSorter.scala line 192, i suspect some input record has
Null key.
189 while (records.hasNext) {
190addElementsRead()
191kv = records.next()
192map.changeValue((getPartition(kv._1), kv._1), update)
On Sat, Mar 12, 2016 at 12:48 PM, Prabhu Joseph
wrote:
> Look
ata through kafka.
>
> On Sat 12 Mar, 2016 20:28 Ted Yu, > wrote:
>
>> Interesting.
>> If kv._1 was null, shouldn't the NPE have come from getPartition() (line
>> 105) ?
>>
>> Was it possible that records.next() returned null ?
>>
>&
of memory for cache. So, when a Spark Executor has lot of memory
available
for cache and does not use the cache but when there is a need to do lot of
shuffle, will executors only use the shuffle fraction which is set for
doing shuffle or will it use
the free memory available for cache as well.
Thanks,
Prabhu Joseph
gt;
>>
>>
>> On 14 March 2016 at 08:06, Sabarish Sasidharan <
>> sabarish.sasidha...@manthan.com> wrote:
>>
>>> Which version of Spark are you using? The configuration varies by
>>> version.
>>>
>>> Regards
>>> Sab
our case.
>
> Regards
> Sab
>
> On Mon, Mar 14, 2016 at 2:20 PM, Prabhu Joseph > wrote:
>
>> It is a Spark-SQL and the version used is Spark-1.2.1.
>>
>> On Mon, Mar 14, 2016 at 2:16 PM, Sabarish Sasidharan <
>> sabarish.sasidha...@manthan.com>
gt;
>
> On 14 March 2016 at 08:06, Sabarish Sasidharan <
> sabarish.sasidha...@manthan.com> wrote:
>
>> Which version of Spark are you using? The configuration varies by version.
>>
>> Regards
>> Sab
>>
>> On Mon, Mar 14, 2016 at 10:53 AM, Prabhu Jose
pyspark script.
DEFAULT_PYTHON="/ANACONDA/anaconda2/bin/python2.7"
Thanks,
Prabhu Joseph
On Tue, Mar 15, 2016 at 11:52 AM, Stuti Awasthi
wrote:
> Hi All,
>
>
>
> I have a Centos cluster (without any sudo permissions) which has by
> default Python 2.6. Now I hav
/14 15:35:32 1.4 min
164/164 * (163 skipped) *19841/19788
*(41405 skipped)*
Thanks,
Prabhu Joseph
pped -- i.e. no need to recompute that stage.
>
> On Tue, Mar 15, 2016 at 5:50 PM, Jeff Zhang wrote:
>
>> If RDD is cached, this RDD is only computed once and the stages for
>> computing this RDD in the following jobs are skipped.
>>
>>
>> On Wed, Mar 16, 2016 at
Tasks in the the
> 163 Stages that were skipped.
>
> I think -- but the Spark UI's accounting may not be 100% accurate and bug
> free.
>
> On Tue, Mar 15, 2016 at 6:34 PM, Prabhu Joseph > wrote:
>
>> Okay, so out of 164 stages, is 163 are skipped. And how 41405 tas
,
Prabhu Joseph
concurrency is affected
by Single Driver. How to improve the concurrency and what are the best
practices.
Thanks,
Prabhu Joseph
nd not others?
>
> It sounds like an interesting problem…
>
> On Jun 23, 2016, at 5:21 AM, Prabhu Joseph
> wrote:
>
> Hi All,
>
>On submitting 20 parallel same SQL query to Spark Thrift Server, the
> query execution time for some queries are less than a second and some a
Take thread dump of Executor process several times in a short time period
and check what each threads are doing at different times which will help to
identify the expensive sections in user code.
Thanks,
Prabhu Joseph
On Sat, Jan 2, 2016 at 3:28 AM, unk1102 wrote:
> Sorry please see attac
for every 2 seconds and total 1
minute. This will help to identify the code where threads are spending lot
of time and then try to tune.
Thanks,
Prabhu Joseph
On Sat, Jan 2, 2016 at 1:28 PM, Umesh Kacha wrote:
> Hi thanks I did that and I have attached thread dump images. That was
machine and jps -l will list all java
processes, jstack -l will give the stack trace.
Thanks,
Prabhu Joseph
On Mon, Jan 11, 2016 at 7:56 PM, Umesh Kacha wrote:
> Hi Prabhu thanks for the response. How do I find pid of a slow running
> task. Task is running in yarn cluster node. When I
application attempt, there are many
finishApplicationMaster request causing the ERROR.
Need your help to understand on what scenario the above happens.
JIRA's related are
https://issues.apache.org/jira/browse/SPARK-1032
https://issues.apache.org/jira/browse/SPARK-3072
Thanks,
Prabhu Joseph
ores,
2.0 GB RAM
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated:
app-20160201065319-0014/2848 is now LOADING
16/02/01 06:54:28 INFO AppClient$ClientEndpoint: Executor updated:
app-20160201065319-0014/2848 is now RUNNING
....
Thanks,
Prabhu Joseph
Thanks Ted. My concern is how to avoid these kind of user errors on a
production cluster, it would be better if Spark handles this instead of
creating an Executor for every second and fails and overloading the Spark
Master. Shall i report a Spark JIRA to handle this.
Thanks,
Prabhu Joseph
On
, saveAsHadoopFile runs fine.
What could be the reason for ExecutorLostFailure failing when cores per
executor is high.
Error: ExecutorLostFailure (executor 3 lost)
16/02/02 04:22:40 WARN TaskSetManager: Lost task 1.3 in stage 15.0 (TID
1318, hdnprd-c01-r01-14):
Thanks,
Prabhu Joseph
executor does not have enough heap.
Thanks,
Prabhu Joseph
On Thu, Feb 4, 2016 at 11:25 AM, fightf...@163.com
wrote:
> Hi,
>
> I want to make sure that the cache table indeed would accelerate sql
> queries. Here is one of my use case :
> impala table size : 24.59GB, no pa
up and launching it on a
less-local node.
So after making it 0, all tasks started parallel. But learned that it is
better not to reduce it to 0.
On Mon, Feb 1, 2016 at 2:02 PM, Prabhu Joseph
wrote:
> Hi All,
>
>
> Sample Spark application which reads a logfile from hadoop (1.2GB
> must be the process of putting ..."
> - Edsger Dijkstra
>
> "If you pay peanuts you get monkeys"
>
>
> 2016-02-04 11:33 GMT+01:00 Prabhu Joseph :
>
>> Okay, the reason for the task delay within executor when some RDD in
>> memory and some in Hadoop i.
://issues.apache.org/jira/browse/SPARK-5342
spark.yarn.credentials.file
How to renew the AMRMToken for a long running job on YARN?
Thanks,
Prabhu Joseph
+ Spark-Dev
On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph
wrote:
> Hi All,
>
> A long running Spark job on YARN throws below exception after running
> for few days.
>
> yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row.
> org.apache.hadoop.yarn.exceptio
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Thanks,
Prabhu Joseph
of hbase
client jars, when i checked launch container.sh , Classpath does not have
$PWD/* and hence all the hbase client jars are ignored.
Is spark.yarn.dist.files not for adding jars into the executor classpath.
Thanks,
Prabhu Joseph
On Tue, Feb 9, 2016 at 1:42 PM, Prabhu Joseph
wrote:
>
hadoop-2.5.1 and hence
spark.yarn.dist.files does not work with hadoop-2.5.1,
spark.yarn.dist.files works fine on hadoop-2.7.0, as CWD/* is included in
container classpath through some bug fix. Searching for the JIRA.
Thanks,
Prabhu Joseph
On Wed, Feb 10, 2016 at 4:04 PM, Ted Yu wrote:
> H
Worker nodes are
exactly the same as what Spark Master GUI shows.
Thanks,
Prabhu Joseph
On Mon, Feb 15, 2016 at 11:51 AM, Kartik Mathur wrote:
> on spark 1.5.2
> I have a spark standalone cluster with 6 workers , I left the cluster idle
> for 3 days and after 3 days I saw only 4 worke
wrong SPARK_MASTER_IP at
worker nodes.
Check the logs of other workers running to see what SPARK_MASTER_IP it
has connected, I don't think it is using a wrong Master IP.
Thanks,
Prabhu Joseph
On Mon, Feb 15, 2016 at 12:34 PM, Kartik Mathur wrote:
> Thanks Prabhu ,
>
> I had
k.sql.DataFrame = [Prabhu: string, Joseph: string]
So is there any real need for HiveContext inside Spark Shell. Is everything
that can be done with HiveContext, achievable with SqlContext inside Spark
Shell.
Thanks,
Prabhu Joseph
using SqlContext .
>>
>> scala> var df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load("/SPARK/abc")
>> df: org.apache.spark.sql.DataFrame = [Prabhu: string, Joseph: string]
>>
>> So is there any real need for HiveContext inside Spark Shell. Is
>> everything that can be done with HiveContext, achievable with SqlContext
>> inside Spark Shell.
>>
>>
>>
>> Thanks,
>> Prabhu Joseph
>>
>>
>>
>>
>>
>
taking 2-3 times longer than A,
which shows concurrency does not improve with shared Spark Context. [Spark
Job Server]
Thanks,
Prabhu Joseph
java old threading is used somewhere.
On Friday, February 19, 2016, Jörn Franke wrote:
> How did you configure YARN queues? What scheduler? Preemption ?
>
> > On 19 Feb 2016, at 06:51, Prabhu Joseph > wrote:
> >
> > Hi All,
> >
> >When running con
43 matches
Mail list logo