My bad, gothos on IRC pointed me to the docs:
http://jhz.name/2016/01/10/spark-classpath.html
Thanks Gothos!
On Fri, Sep 9, 2016 at 9:23 PM, Colin Kincaid Williams <disc...@uw.edu> wrote:
> I'm using the spark shell v1.61 . I have a classpath conflict, where I
> have an exte
I'm using the spark shell v1.61 . I have a classpath conflict, where I
have an external library ( not OSS either :( , can't rebuild it.)
using httpclient-4.5.2.jar. I use spark-shell --jars
file:/path/to/httpclient-4.5.2.jar
However spark is using httpclient-4.3 internally. Then when I try to
use
Streaming UI tab showing empty events and very different metrics than on 1.5.2
On Thu, Jun 23, 2016 at 5:06 AM, Colin Kincaid Williams <disc...@uw.edu> wrote:
> After a bit of effort I moved from a Spark cluster running 1.5.2, to a
> Yarn cluster running 1.6.1 jars. I'm still settin
related to running on the Spark
1.5.2 cluster. Also is the missing event count in the completed
batches a bug? Should I file an issue?
On Tue, Jun 21, 2016 at 9:04 PM, Colin Kincaid Williams <disc...@uw.edu> wrote:
> Thanks @Cody, I will try that out. In the interm, I tried to validate
&
ake HBase out of the equation and just measure what your read
> performance is by doing something like
>
> createDirectStream(...).foreach(_.println)
>
> not take() or print()
>
> On Tue, Jun 21, 2016 at 3:19 PM, Colin Kincaid Williams <disc...@uw.edu>
> wrote:
>>
pic Partitions / Streaming
Duration / maxRatePerPartition / any other spark settings or code
changes that I should make to try to get a better consumption rate.
Thanks for all the help so far, this is the first Spark application I
have written.
On Mon, Jun 20, 2016 at 12:32 PM, Colin Kincaid Williams &l
and your average processing time is
> 1.16 seconds, you're always going to be falling behind. That would
> explain why you've built up an hour of scheduling delay after eight
> hours of running.
>
> On Sat, Jun 18, 2016 at 4:40 PM, Colin Kincaid Williams <disc...@uw.edu&
for details including shuffles
>> etc?
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>
>>
>>
>> http://talebzadehmich.
I'm attaching a picture from the streaming UI.
On Sat, Jun 18, 2016 at 7:59 PM, Colin Kincaid Williams <disc...@uw.edu> wrote:
> There are 25 nodes in the spark cluster.
>
> On Sat, Jun 18, 2016 at 7:53 PM, Mich Talebzadeh
> <mich.talebza...@gmail.com> wrote:
>> how
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>
> On 18 June 2016 at 20:40, Colin Kincaid Williams <disc...@uw.edu>
ib/hbase/lib/*
\
/home/colin.williams/kafka-hbase.jar "FromTable" "ToTable"
"broker1:9092,broker2:9092"
On Tue, May 3, 2016 at 8:20 PM, Colin Kincaid Williams <disc...@uw.edu> wrote:
> Thanks Cody, I can see that the partitions are well distributed...
> Then I'm in the
as producers are distributing across partitions evenly).
>
> On Tue, May 3, 2016 at 1:44 PM, Colin Kincaid Williams <disc...@uw.edu> wrote:
>> Thanks again Cody. Regarding the details 66 kafka partitions on 3
>> kafka servers, likely 8 core systems with 10 disks each. Maybe the
>
> Really though, I'd try to start with spark 1.6 and direct streams, or
> even just kafkacat, as a baseline.
>
>
>
> On Mon, May 2, 2016 at 7:01 PM, Colin Kincaid Williams <disc...@uw.edu> wrote:
>> Hello again. I searched for "backport kafka" in the list archive
.2, or is upgrading possible? The
> kafka direct stream is available starting with 1.3. If you're stuck
> on 1.2, I believe there have been some attempts to backport it, search
> the mailing list archives.
>
> On Mon, May 2, 2016 at 12:54 PM, Colin Kincaid Williams <disc...@uw.edu&g
ted to using spark 1.2, or is upgrading possible? The
> kafka direct stream is available starting with 1.3. If you're stuck
> on 1.2, I believe there have been some attempts to backport it, search
> the mailing list archives.
>
> On Mon, May 2, 2016 at 12:54 PM, Colin Kincaid W
park, at least
> to some extent.
>
> David Krieg | Enterprise Software Engineer
> Early Warning
> Direct: 480.426.2171 | Fax: 480.483.4628 | Mobile: 859.227.6173
>
>
> -----Original Message-
> From: Colin Kincaid Williams [mailto:disc...@uw.edu]
> Sent: Monday, May 02, 2
I've written an application to get content from a kafka topic with 1.7
billion entries, get the protobuf serialized entries, and insert into
hbase. Currently the environment that I'm running in is Spark 1.2.
With 8 executors and 2 cores, and 2 jobs, I'm only getting between
0-2500 writes /
I launch around 30-60 of these jobs defined like start-job.sh in the
background from a wrapper script. I wait about 30 seconds between launches,
then the wrapper monitors yarn to determine when to launch more. There is a
limit defined at around 60 jobs, but even if I set it to 30, I run out of
/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
It may be slightly different for you if the resource manager and the
history server are not on the same machine.
Hope it will work for you as well!
Christophe.
On 24/02/2015 06:31, Colin Kincaid Williams wrote:
Hi,
I have been trying
the info in one place.
On Tue, Feb 24, 2015 at 12:36 PM, Colin Kincaid Williams disc...@uw.edu
wrote:
Looks like in my tired state, I didn't mention spark the whole time.
However, it might be implied by the application log above. Spark log
aggregation appears to be working, since I can run
Hi,
I have been trying to get my yarn logs to display in the spark
history-server or yarn history-server. I can see the log information
yarn logs -applicationId application_1424740955620_0009
15/02/23 22:15:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing
over to
21 matches
Mail list logo