Re: Serving data

andy petrella Mon, 15 Sep 2014 04:52:11 -0700

I'm using Parquet in ADAM, and I can say that it works pretty fine!
Enjoy ;-)


aℕdy ℙetrella
about.me/noootsab
[image: aℕdy ℙetrella on about.me]

<http://about.me/noootsab>

On Mon, Sep 15, 2014 at 1:41 PM, Marius Soutier <mps....@gmail.com> wrote:

> Thank you guys, I’ll try Parquet and if that’s not quick enough I’ll go
> the usual route with either read-only or normal database.
>
> On 13.09.2014, at 12:45, andy petrella <andy.petre...@gmail.com> wrote:
>
> however, the cache is not guaranteed to remain, if other jobs are launched
> in the cluster and require more memory than what's left in the overall
> caching memory, previous RDDs will be discarded.
>
> Using an off heap cache like tachyon as a dump repo can help.
>
> In general, I'd say that using a persistent sink (like Cassandra for
> instance) is best.
>
> my .2¢
>
>
> aℕdy ℙetrella
> about.me/noootsab
> [image: aℕdy ℙetrella on about.me]
>
> <http://about.me/noootsab>
>
> On Sat, Sep 13, 2014 at 9:20 AM, Mayur Rustagi <mayur.rust...@gmail.com>
> wrote:
>
>> You can cache data in memory & query it using Spark Job Server.
>> Most folks dump data down to a queue/db for retrieval
>> You can batch up data & store into parquet partitions as well. & query it
>> using another SparkSQL  shell, JDBC driver in SparkSQL is part 1.1 i
>> believe.
>> --
>> Regards,
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoidanalytics.com
>> @mayur_rustagi
>>
>>
>> On Fri, Sep 12, 2014 at 2:54 PM, Marius Soutier <mps....@gmail.com>
>> wrote:
>>
>>> Hi there,
>>>
>>> I’m pretty new to Spark, and so far I’ve written my jobs the same way I
>>> wrote Scalding jobs - one-off, read data from HDFS, count words, write
>>> counts back to HDFS.
>>>
>>> Now I want to display these counts in a dashboard. Since Spark allows to
>>> cache RDDs in-memory and you have to explicitly terminate your app (and
>>> there’s even a new JDBC server in 1.1), I’m assuming it’s possible to keep
>>> an app running indefinitely and query an in-memory RDD from the outside
>>> (via SparkSQL for example).
>>>
>>> Is this how others are using Spark? Or are you just dumping job results
>>> into message queues or databases?
>>>
>>>
>>> Thanks
>>> - Marius
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>
>

Re: Serving data

Reply via email to