Reading files directly from Amazon S3 can be frustrating especially if
you're dealing with a large number of input files, could you please
elaborate more on your use-case? Does the S3 bucket in question already
contain a large number of files?
The implementation of the * wildcard operator in S3
as failover/retry logic etc.
Best of luck!
MC
*Michael Cutler*
Founder, CTO
*Mobile: +44 789 990 7847Email: mich...@tumra.com mich...@tumra.comWeb:
tumra.com http://tumra.com/?utm_source=signatureutm_medium=email*
*Visit us at our offices in Chiswick Park http://goo.gl/maps/abBxq*
*Registered
..)
is how to limit the external service call rate and manage the incoming
buffer size (enqueuing).
Could you give me some tips for that?
Thanks again,
Flavio
On Thu, Jun 19, 2014 at 10:19 AM, Michael Cutler mich...@tumra.com
wrote:
Hello Flavio,
It sounds to me like the best solution
When you start seriously using Spark in production there are basically two
things everyone eventually needs:
1. Scheduled Jobs - recurring hourly/daily/weekly jobs.
2. Always-On Jobs - that require monitoring, restarting etc.
There are lots of ways to implement these requirements,
TAR.GZ
direct from HDFS, unpack it and launch the appropriate script.
Makes for a much cleaner development / testing / deployment to package
everything required in one go instead of relying on cluster specific
classpath additions or any add-jars functionality.
On 19 June 2014 22:53, Michael Cutler
and have them managed using Mesos/Marathon
http://mesosphere.io/ to handle failures and restarts with long running
processes.
Good luck!
MC
*Michael Cutler*
Founder, CTO
*Mobile: +44 789 990 7847Email: mich...@tumra.com mich...@tumra.comWeb:
tumra.com http://tumra.com/?utm_source
Hello Wei,
I talk from experience of writing many HPC distributed application using
Open MPI (C/C++) on x86, PowerPC and Cell B.E. processors, and Parallel
Virtual Machine (PVM) way before that back in the 90's. I can say with
absolute certainty:
*Any gains you believe there are because C++ is
in HDFS. Done right you should
be able to achieve interactive (few second) lookups.
Have fun!
MC
*Michael Cutler*
Founder, CTO
*Mobile: +44 789 990 7847Email: mich...@tumra.com mich...@tumra.comWeb:
tumra.com http://tumra.com/?utm_source=signatureutm_medium=email*
*Visit us at our offices
Hello,
You're absolutely right, the syntax you're using is returning the json4s
value objects, not native types like Int, Long etc. fix that problem and
then everything else (filters) will work as you expect. This is a short
snippet of a larger example: [1]
val lines =
Hey Nilesh,
Great to hear your using Spark Streaming, in my opinion the crux of your
question comes down to what you want to do with the data in the future
and/or if there is utility it using it from more than one Spark/Streaming
job.
1). *One-time-use fire and forget *- as you rightly point
*Michael Cutler*
Founder, CTO
*Mobile: +44 789 990 7847Email: mich...@tumra.com mich...@tumra.comWeb:
tumra.com http://tumra.com/?utm_source=signatureutm_medium=email*
*Visit us at our offices in Chiswick Park http://goo.gl/maps/abBxq*
*Registered in England Wales, 07916412. VAT No. 130595328
/cotdp/b5b8155bb85e254d2a3c
MC
*Michael Cutler*
Founder, CTO
*Mobile: +44 789 990 7847Email: mich...@tumra.com mich...@tumra.comWeb:
tumra.com http://tumra.com/?utm_source=signatureutm_medium=email*
*Visit us at our offices in Chiswick Park http://goo.gl/maps/abBxq*
*Registered in England
and REGEXP so clearly some of the basics are in there.
As the saying goes ... *Use the source, Luke!
http://blog.codinghorror.com/learn-to-read-the-source-luke/* :o)
ᐧ
*Michael Cutler*
Founder, CTO
*Mobile: +44 789 990 7847Email: mich...@tumra.com mich...@tumra.comWeb:
tumra.com http
Eclipse.
Best,
Michael
*Michael Cutler*
Founder, CTO
*Mobile: +44 789 990 7847Email: mich...@tumra.com mich...@tumra.comWeb:
tumra.com http://tumra.com/?utm_source=signatureutm_medium=email*
*Visit us at our offices in Chiswick Park http://goo.gl/maps/abBxq*
*Registered in England
https://issues.apache.org/jira/browse/HADOOP-8900 and it affects all Hadoop
releases prior to 1.2.X
MC
*Michael Cutler*
Founder, CTO
*Mobile: +44 789 990 7847Email: mich...@tumra.com mich...@tumra.comWeb:
tumra.com http://tumra.com/?utm_source=signatureutm_medium=email*
*Visit us at our
this on a small sample of data you get results like this:
- female: average=114, count=15422
- male: average=104, count=14727
Which basically says the average level achieved by women is slightly
higher than guys.
Best of luck fishing through Facebook data!
MC
*Michael Cutler*
Founder, CTO
16 matches
Mail list logo