, you can go
with mapPartitions.
Regards,
Kamal
--
Flavio Pompermaier
*Development Department*___
*OKKAM**Srl **- www.okkam.it http://www.okkam.it/*
*Phone:* +(39) 0461 283 702
*Fax:* + (39) 0461 186 6433
*Email:* pomperma...@okkam.it
Hi to all,
I'm trying to convert my old mapreduce job to a spark one but I have some
doubts..
My application basically buffers a batch of updates and every 100 elements
it flushes the batch to a server. This is very easy in mapreduce but I
don't know how you can do that in scala..
For example, if
Maybe you could implement something like this (i don't know if something
similar already exists in spark):
http://www.cs.berkeley.edu/~jnwang/papers/icde14_massjoin.pdf
Best,
Flavio
On Oct 8, 2014 9:58 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
Multiple values may be different, yet
Hi to all, sorry for not being fully on topic but I have 2 quick questions
about Parquet tables registered in Hive/sparq:
1) where are the created tables stored?
2) If I have multiple hiveContexts (one per application) using the same
parquet table, is there any problem if inserting concurrently
Isn't sqoop export meant for that?
http://hadooped.blogspot.it/2013/06/apache-sqoop-part-3-data-transfer.html?m=1
On Aug 7, 2014 7:59 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
Vida,
What kind of database are you trying to write to?
For example, I found that for loading into
Hi everybody,
I have a scenario where I would like to stream data to different
persistency types (i.e. sql db, graphdb ,hdfs, etc) and perform some
filtering and trasformation as the the data comes in.
The problem is to maintain consistency between all datastores (maybe some
operation could fail)
Hi folks,
I was looking at the benchmark provided by Cloudera at
http://blog.cloudera.com/blog/2014/05/new-sql-choices-in-the-apache-hadoop-ecosystem-why-impala-continues-to-lead/
.
Is it real that Shark cannot execute some query if you don't have enough
memory?
And is it true/reliable that Impala
.
-Soumya
On Wed, Jun 18, 2014 at 6:50 PM, Flavio Pompermaier pomperma...@okkam.it
wrote:
Thanks for the quick reply soumya. Unfortunately I'm a newbie with
Spark..what do you mean? is there any reference to how to do that?
On Thu, Jun 19, 2014 at 12:24 AM, Soumya Simanta
soumya.sima
you must not use, disclose, copy,
print, distribute or rely on this email.
On 19 June 2014 07:50, Flavio Pompermaier pomperma...@okkam.it wrote:
Yes, I need to call the external service for every event and the order
does not matter.
There's no time limit in which each events should
to
see how they manage it using several worker threads.
My suggestion would be to knock-up a basic custom receiver and give it a
shot!
MC
On 19 June 2014 09:31, Flavio Pompermaier pomperma...@okkam.it wrote:
Hi Michael,
thanks for the tip, it's really an elegant solution.
What I'm still
Hi to all,
in my use case I'd like to receive events and call an external service as
they pass through. Is it possible to limit the number of contemporaneous
call to that service (to avoid DoS) using Spark streaming? if so, limiting
the rate implies a possible buffer growth...how can I control the
into Spark. This component can control in input rate to spark.
On Jun 18, 2014, at 6:13 PM, Flavio Pompermaier pomperma...@okkam.it
wrote:
Hi to all,
in my use case I'd like to receive events and call an external service
as they pass through. Is it possible to limit the number
Is there a way to query fields by similarity (like Lucene or using a
similarity metric) to be able to query something like WHERE language LIKE
it~0.5 ?
Best,
Flavio
On Thu, May 22, 2014 at 8:56 AM, Michael Cutler mich...@tumra.com wrote:
Hi Nick,
Here is an illustrated example which
Is there any Spark plugin/add-on that facilitate the query to a JSON
content?
Best,
Flavio
On Thu, May 15, 2014 at 6:53 PM, Michael Armbrust mich...@databricks.comwrote:
Here is a link with more info:
http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html
On Wed, May
Great work!thanks!
On May 13, 2014 3:16 AM, zhen z...@latrobe.edu.au wrote:
Hi Everyone,
I found it quite difficult to find good examples for Spark RDD API calls.
So
my student and I decided to go through the entire API and write examples
for
the vast majority of API calls (basically
it is discutable and it's more my personal opinion.
2014-04-17 23:28 GMT+02:00 Flavio Pompermaier pomperma...@okkam.it:
Thanks again Eugen! I don't get the point..why you prefer to avoid kyro
ser for closures?is there any problem with that?
On Apr 17, 2014 11:10 PM, Eugen Cepoi cepoi.eu...@gmail.com
Hi to all,
in my application I read objects that are not serializable because I cannot
modify the sources.
So I tried to do a workaround creating a dummy class that extends the
unmodifiable one but implements serializable.
All attributes of the parent class are Lists of objects (some of them are
Eugen
2014-04-14 18:21 GMT+02:00 Flavio Pompermaier pomperma...@okkam.it:
Hi to all,
in my application I read objects that are not serializable because I
cannot modify the sources.
So I tried to do a workaround creating a dummy class that extends the
unmodifiable one but implements
serialization
does not ser/deser attributes from classes that don't impl. Serializable
(in your case the parent classes).
2014-04-14 23:17 GMT+02:00 Flavio Pompermaier pomperma...@okkam.it:
Thanks Eugen for tgee reply. Could you explain me why I have the
problem?Why my serialization doesn't work
resources share cluster.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Wed, Apr 9, 2014 at 12:10 AM, Flavio Pompermaier
pomperma...@okkam.itwrote:
Hi to everybody,
I'm new to Spark and I'd like to know if running
? Is there any suggestion about how to start?
On Wed, Apr 9, 2014 at 11:37 PM, Flavio Pompermaier pomperma...@okkam.itwrote:
Any help about this...?
On Apr 9, 2014 9:19 AM, Flavio Pompermaier pomperma...@okkam.it wrote:
Hi to everybody,
In my current scenario I have complex objects stored
Hi to everybody,
I'm new to Spark and I'd like to know if running Spark on top of YARN or
Mesos could affect (and how much) its performance. Is there any doc about
this?
Best,
Flavio
at 9:57 AM, Flavio Pompermaier
pomperma...@okkam.itwrote:
Hi to everybody,
in these days I looked a bit at the recent evolution of the big data
stacks and it seems that HBase is somehow fading away in favour of
Spark+HDFS. Am I correct?
Do you think that Spark and HBase should work together
23 matches
Mail list logo