{$cid} {$pii} {$content-type} {$srctitle} {$document-type} {$document-subtype} {$publication-date} {$article-title} {$issn} {$isbn}
{$lang} {$tables} return
xml-to-json($retval)
Darin.
On Tuesday, March 13, 2018,
This issue on stackoverflow maybe help
https://stackoverflow.com/questions/42641573/why-does-memory-usage-of-spark-worker-increases-with-time/42642233#42642233
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-exectors-memory-increasing-and-ex
I add this code in foreachRDD block .
```
rdd.persist(StorageLevel.MEMORY_AND_DISK)
```
This exception no occur agein.But many executor dead showing in spark
streaming UI .
```
User class threw exception: org.apache.spark.SparkException: Job aborted due
to stage failure: Task 21 in stage 1194.0 f
Hi,
I got this exception when streaming program run some hours.
```
*User class threw exception: org.apache.spark.SparkException: Job aborted
due to stage failure: Task 21 in stage 1194.0 failed 4 times, most recent
failure: Lost task 21.3 in stage 1194.0 (TID 2475, 2.dev3, executor 66):
ExecutorL
I think your sets not works
try add `ulimit -n 10240 ` in spark-env.sh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-too-many-open-files-although-ulimit-set-to-1048576-tp28490p28491.html
Sent from the Apache Spark User List mailing list archive a
d.partitioner
I get
Option[org.apache.spark.Partitioner] = None
I would have thought it would be HashPartitioner.
Does anyone know why this would be None and not HashPartitioner?
Thanks.
Darin.
ld be appreciated.
Thanks.
Darin.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-find-the-partitioner-for-a-Dataset-tp27672.html
Sent from the Apache Spark User List mailing list archive at Nabbl
iated.
Thanks.
Darin.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
uld of course
increase the partitions (and the size of my cluster) but I also want to clarify
my understanding of whole-stage code generation.
Any thought/suggestions would be appreciated. Also, if anyone has found good
resources that further explain the details
that it
returns a string. So, you have to be a little creative when returning multiple
values (such as delimiting the values with a special character and then
splitting on this delimiter).
Darin.
From: Diwakar Dhanuskodi
To: Darin McBeath ; Hyukjin Kwon ;
Jörn Franke
Cc: Felix
xslt to transform, extract, or
filter.
Like mentioned below, you want to initialize the parser in a mapPartitions call
(one of the examples shows this).
Hope this is helpful.
Darin.
From: Hyukjin Kwon
To: Jörn Franke
Cc: Diwakar Dhanuskodi ; Felix Cheung
) be faster in many use cases (such as the one I'm using above).
Am I doing something wrong or is this to be expected?
Thanks.
Darin.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
tanding for how repartition
should work or if this is a bug. Thanks Jacek for starting to dig into this.
Darin.
- Original Message -----
From: Darin McBeath
To: Jacek Laskowski
Cc: user
Sent: Friday, March 11, 2016 1:57 PM
Subject: Re: spark 1.6 foreachPartition only appears to be running o
fo("SimpleStorageServiceInit call arg1: "+ arg1);
log.info("SimpleStorageServiceInit call arg2:"+ arg2);
log.info("SimpleStorageServiceInit call arg3: "+ arg3);
SimpleStorageService.init(this.arg1, this.arg2, this.arg3);
}
}
From: Jacek L
st the job dies.
Darin.
From: Jacek Laskowski
To: Darin McBeath
Cc: user
Sent: Friday, March 11, 2016 1:24 PM
Subject: Re: spark 1.6 foreachPartition only appears to be running on one
executor
Hi,
How do you check which executor is used? Can you include a
sing spark-submit.
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
wholeTextFiles(since some of my files can have embedded
newlines). But, I wonder how either of these would behave if I passed
literally a million (or more) 'filenames'.
Before I spend time exploring, I wanted to seek some input.
Any th
Thanks.
I already set the following in spark-defaults.conf so I don't think that is
going to fix my problem.
spark.eventLog.dir file:///root/spark/applicationHistory
spark.eventLog.enabled true
I suspect my problem must be something else.
Darin.
From
le to view the application history
for both jobs.
Has anyone else noticed this issue? Any suggestions?
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: u
ytes()));
}
}
Lastly, you will need to include the saxon9he jar file (open source version).
This would work for XPath, XQuery, and XSLT.
Hope this helps. When I get a chance, I will update the spark-xml-utils github
site with details on the new getInstance function and som
D that is completely empty) or you could have it find 'local' versions (on
the workers or in S3 and then cache them locally for performance).
I will post an update when the code has been adjusted.
Darin.
- Original Message -
From: Shivalik
To: user@spark.apache.org
Sent: T
ava)
val proc = XPathProcessor.getInstance(xpath,namespaces)
recsIter.map(rec => proc.evaluateString(rec._2).toInt)
}).sum
There is more documentation on the spark-xml-utils github site. Let me know if
the documentation is not clear or if you have any
http://www.meetup.com/Cincinnati-Apache-Spark-Meetup/
Thanks.
Darin.
tyId: Long, EntityType: String,
CustomerId: String, EntityURI: String, NumDocs: Long)
val entities = sc.textFile("s3n://darin/Entities.csv")
val entitiesArr = entities.map(v => v.split('|'))
val dfEntity = entitiesArr.map(arr => Entity(arr(0).toLong, arr(1).toLong,
arr(2),
nted to confirm.
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
s for hadoop2.0.0 (and I think Cloudera).
Is there a way that I can force the install of the same assembly to the
cluster that comes with the 1.2 download of spark?
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apach
Thanks for you quick reply. Yes, that would be fine. I would rather wait/use
the optimal approach as opposed to hacking some one-off solution.
Darin.
From: Kostas Sakellis
To: Darin McBeath
Cc: User
Sent: Friday, February 27, 2015 12:19 PM
Subject: Re
'good' records as
well? I realize that if I lose a partition that I might over count, but
perhaps that is an acceptable trade-off.
I'm guessing that others have ran into this before so I would like to learn
from the exper
ne has any tips for what I should look into it would be appreciated.
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Just to close the loop in case anyone runs into the same problem I had.
By setting --hadoop-major-version=2 when using the ec2 scripts, everything
worked fine.
Darin.
- Original Message -
From: Darin McBeath
To: Mingyu Kim ; Aaron Davidson
Cc: "user@spark.apache.org"
Se
I'll try it and post a response.
- Original Message -
From: Mingyu Kim
To: Darin McBeath ; Aaron Davidson
Cc: "user@spark.apache.org"
Sent: Monday, February 23, 2015 3:06 PM
Subject: Re: Which OutputCommitter to use for S3?
Cool, we will start from there. Thanks Aaron
156)
In my class, JobContext is an interface of type
org.apache.hadoop.mapred.JobContext.
Is there something obvious that I might be doing wrong (or messed up in the
translation from Scala to Java) or something I should look into? I'm using
Spark 1.2 with hadoop 2.4.
Thanks.
Darin.
Spark 1.2 on a stand-alone cluster on ec2. To get the counts
for the records, I'm using the .count() for the RDD.
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-ma
Thanks Imran. That's exactly what I needed to know.
Darin.
From: Imran Rashid
To: Darin McBeath
Cc: User
Sent: Tuesday, February 17, 2015 8:35 PM
Subject: Re: How do you get the partitioner for an RDD in Java?
a JavaRDD is just a wrapper aro
hod in Spark 1.2 or a way
of getting this information.
I'm curious if anyone has any suggestions for how I might go about finding how
an RDD is partitioned in a Java program.
Thanks.
Darin.
-
To unsubscribe, e-mail: user-un
would really want to do in the first place.
Thanks again for your insights.
Darin.
From: Imran Rashid
To: Darin McBeath
Cc: User
Sent: Tuesday, February 17, 2015 3:29 PM
Subject: Re: MapValues and Shuffle Reads
Hi Darin,
When you say you "see 400GB
of baseline records: " + baselinePairRDD.count());
Both hsfBaselinePairRDD and baselinePairRDD have 1024 partitions.
Any insights would be appreciated.
I'm using Spark 1.2.0 in a stand-alone cluster.
Darin.
-
To unsu
ased
from 4 to 16
.set("spark.task.maxFailures","64") // Didn't really matter as I had no
failures in this run
.set("spark.storage.blockManagerSlaveTimeoutMs","30");
From: Sven Krasser
To: Darin McBeath
Cc: User
Sent:
01/23 18:59:32
4.5 min 0.3 s 323.4 MB
If anyone has some suggestions please let me know. I've tried playing around
with various configuration options but I've found nothing yet that will fix the
underlying issue.
Thanks.
Darin.
ere is even any 'shuffle read' when
constructing the baselinePairRDD. If anyone could shed some light on this it
would be appreciated.
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For ad
ere is even any 'shuffle read' when
constructing the baselinePairRDD. If anyone could shed some light on this it
would be appreciated.
Thanks.
Darin.
Take a look at the O'Reilly Learning Spark (Early Release) book. I've found
this very useful.
Darin.
From: Saurabh Agrawal
To: "user@spark.apache.org"
Sent: Thursday, November 20, 2014 9:04 AM
Subject: Please help me get started on Apache Spark
Friends,
ave any thoughts/insights, it would be appreciated.
Thanks.
Darin.
Here is where I'm setting the 'timeout' in my spark job.
SparkConf conf = new SparkConf().setAppName("SparkSync
Application").set("spark.serializer",
"org.apache
n the workers, the directories for /mnt/spark and
/mnt2/spark do not exist.
Am I missing something? Has anyone else noticed this?
A colleague was started a cluster (using the ec2 scripts) but for m3.xlarge
machines and both /mnt/spark and /mnt2/spark directories were created.
Thanks.
Darin.
Assume the following where both updatePairRDD and deletePairRDD are both
HashPartitioned. Before the union, each one of these has 512 partitions. The
new created updateDeletePairRDD has 1024 partitions. Is this the
general/expected behavior for a union (the number of partitions to double)?
J
s an RDD (part) is being re-calculated
and is resulting in this error. But, based on other logging statements
throughout my application, I don't believe this is the case.
Thanks.
Darin.
14/11/10 22:35:27 INFO scheduler.DAGScheduler: Failed to run count at
SparkSyn
Let me know if you are interested in participating in a meet up in Cincinnati,
OH to discuss Apache Spark.
We currently have 4-5 different companies expressing interest but would like a
few more.
Darin.
ok. after reading some documentation, it would appear the issue is the default
number of partitions for a join (200).
After doing something like the following, I was able to change the value.
From: Darin McBeath
To: User
Sent: Wednesday, October 29, 2014 1:55 PM
Subject: Spark SQL
Sorry, hit the send key a bitt too early.
Anyway, this is the code I set.
sqlContext.sql("set spark.sql.shuffle.partitions=10");
From: Darin McBeath
To: Darin McBeath ; User
Sent: Wednesday, October 29, 2014 2:47 PM
Subject: Re: Spark SQL and confused about number of partit
in local mode (localhost[1]).
Any insight would be appreciated.
Thanks.
Darin.
-shell as well as from a Java application. Feel free
to use, contribute, and/or let us know how this library can be improved. Let
me know if you have any questions.
Darin.
be called more than once with XPathProcessor.init. I have code in place to
make sure this is not an issue. But, I was wondering if there is a better way
to accomplish something like this.
Thanks.
Darin.
156MB in S3). I get that there could be some differences
between the serialized storage format and what is then used in memory, but I'm
curious as to whether I'm missing something and/or should be doing things
differently.
Thanks.
Darin.
ed
by the Spark provided ec2 startup script. But, that is purely a guess on my
part.
I'm wondering if anyone else has noticed this issue and if so has a workaround.
Thanks.
Darin.
ESS_KEY"));
But, because my SparkContext already exists within spark-shell, this really
isn't an option (unless I'm missing something).
Thanks.
Darin.
karound to this problem.
Thanks.
Darin.
java.lang.NoSuchMethodError:
org.apache.http.impl.conn.DefaultClientConnectionOperator.(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
at
org.apache.http.impl.conn.PoolingClientConnectionManager.createCon
.
Darin.
ark-shell (and copy the jar file to executors)
root@ip-10-233-73-204 spark]$ ./bin/spark-shell --jars
lib/uber-SparkUtils-0.1.jar
## Bring in the sequence file (2 million records)
scala> val xmlKeyPair =
sc.sequenceFile[String,String]("s3n://darin/xml/part*").cache()
## Test values aga
00 --spot-price=.08 -z us-east-1e --worker-instances=2
my-cluster
From: Daniel Siegmann
To: Darin McBeath
Cc: Daniel Siegmann ; "user@spark.apache.org"
Sent: Thursday, July 31, 2014 10:04 AM
Subject: Re: Number of partitions and Number of concu
sed on what the documentation states). What
would I want that value to be based on my configuration below? Or, would I
leave that alone?
From: Daniel Siegmann
To: user@spark.apache.org; Darin McBeath
Sent: Wednesday, July 30, 2014 5:58 PM
Subject: Re: Numbe
he running application but this had no effect. Perhaps,
this is ignored for a 'filter' and the default is the total number of cores
available.
I'm fairly new with Spark so maybe I'm just missing or misunderstanding
something fundamental. Any help would be appreciated.
Thanks.
Darin.
61 matches
Mail list logo