ngu...@gmail.com>
wrote:
> hi Ashish,
>
> I was just wondering if there is any particular reason why you are posting
> this to a SPARK group?
>
> Regards,
> Gourav
>
> On Thu, Dec 21, 2017 at 8:32 PM, ashish rawat <dceash...@gmail.com> wrote:
>
>> Hi,
&g
Hi,
We are working on a streaming solution where multiple out of order streams
are flowing in the system and we need to join the streams based on a unique
id. We are planning to use redis for this, where for every tuple, we will
lookup if the id exists, we join if it does or else put the tuple
r Hakobian, Ph.D.
Staff Data Scientist
Rally Health
nicholas.hakob...@rallyhealth.com
On Sun, Nov 26, 2017 at 8:19 AM, ashish rawat <dceash...@gmail.com> wrote:
> Thanks Holden and Chetan.
>
> Holden - Have you tried it out, do you know the right way to do it?
> Chetan - yes, if we
at 3:31 PM, Holden Karau <hol...@pigscanfly.ca>
> wrote:
>
>> So it’s certainly doable (it’s not super easy mind you), but until the
>> arrow udf release goes out it will be rather slow.
>>
>> On Sun, Nov 26, 2017 at 8:01 AM ashish rawat <dceash...@gmail.com&
Hi,
Has someone tried running NLTK (python) with Spark Streaming (scala)? I was
wondering if this is a good idea and what are the right Spark operators to
do this? The reason we want to try this combination is that we don't want
to run our transformations in python (pyspark), but after the
erent jobs or groups. Within a single group, using Livy to
> create different spark contexts also works.
>
> - Affan
>
> On Tue, Nov 14, 2017 at 8:43 AM, ashish rawat <dceash...@gmail.com> wrote:
>
>> Thanks Sky Yin. This really helps.
>>
>> On Nov 14, 2017 12:
v 11, 2017 at 11:21 PM ashish rawat <dceash...@gmail.com> wrote:
> Hello Everyone,
>
> I was trying to understand if anyone here has tried a data warehouse
> solution using S3 and Spark SQL. Out of multiple possible options
> (redshift, presto, hive etc), we were planning to go
dim.seme...@datadoghq.com>
> *Date: *Sunday, November 12, 2017 at 1:06 PM
> *To: *Gourav Sengupta <gourav.sengu...@gmail.com>
> *Cc: *Phillip Henry <londonjava...@gmail.com>, ashish rawat <
> dceash...@gmail.com>, Jörn Franke <jornfra...@gmail.com>, Deepak S
ut you may
> need to train your data scientists. Some may know or prefer other tools.
>
> On 12. Nov 2017, at 08:32, Deepak Sharma <deepakmc...@gmail.com> wrote:
>
> I am looking for similar solution more aligned to data scientist group.
> The concern i have is abou
Hello Everyone,
I was trying to understand if anyone here has tried a data warehouse
solution using S3 and Spark SQL. Out of multiple possible options
(redshift, presto, hive etc), we were planning to go with Spark SQL, for
our aggregates and processing requirements.
If anyone has tried it out,
-spark-and-netflix-recommendations
>> .
>>
>> btw, Confluent's distribution of Kafka does have a direct Http/REST API
>> which is not recommended for production use, but has worked well for me in
>> the past.
>>
>> these are some additional options to think about, an
Hi,
I have been evaluating Spark for analysing Application and Server Logs. I
believe there are some downsides of doing this:
1. No direct mechanism of collecting log, so need to introduce other tools
like Flume into the pipeline.
2. Need to write lots of code for parsing different patterns from
Hi Todd,
Could you please provide an example of doing this. Mazerunner seems to be doing
something similar with Neo4j but it goes via hdfs and updates only the graph
properties. Is there a direct way to do this with Neo4j or Titan?
Regards,
Ashish
From: SLiZn Liu
Hi,
We are observing a hung spark application when one of the yarn datanode
(running multiple spark executors) go down.
Setup details:
* Spark: 1.2.1
* Hadoop: 2.4.0
* Spark Application Mode: yarn-client
* 2 datanodes (DN1, DN2)
* 6 spark executors (initially 3 executors on
14 matches
Mail list logo