Hi Sriknath thanks much it worked when I set spark.sql.shuffle.partitions=10
I think reducing shuffle partitions will slower my group by query of
hiveContext or it wont slow it down please guide.
On Sat, Jul 11, 2015 at 7:41 AM, Srikanth srikanth...@gmail.com wrote:
Is there a join involved in
1.spark streaming 1.3 creates as many RDD Partitions as there are kafka
partitions in topic. Say I have 300 partitions in topic and 10 executors
and each with 3 cores so , is it means at a time only 10*3=30 partitions
are processed and then 30 like that since executors launch tasks per RDD
On 10 Jul 2015, at 23:10, algermissen1971 algermissen1...@icloud.com wrote:
Hi,
initially today when moving my streaming application to the cluster the first
time I ran in to newbie error of using a local file system for checkpointing
and the RDD partition count differences (see
What is your business case for the move?
Le ven. 10 juil. 2015 à 12:49, Ravisankar Mani rrav...@gmail.com a écrit :
Hi everyone,
I have planned to move mssql server to spark?. I have using around 50,000
to 1l records.
The spark performance is slow when compared to mssql server.
What is
You can certainly query over 4 TB of data with Spark. However, you will
get an answer in minutes or hours, not in milliseconds or seconds. OLTP
databases are used for web applications, and typically return responses in
milliseconds. Analytic databases tend to operate on large data sets, and
Hello. Had the same question. What if I need to store 4-6 Tb and do
queries? Can't find any clue in documentation.
Am 11.07.2015 03:28 schrieb Mohammed Guller moham...@glassbeam.com:
Hi Ravi,
First, Neither Spark nor Spark SQL is a database. Both are compute
engines, which need to be paired
Reducing no.of partitions may have impact on memory consumption. Especially
if there is uneven distribution of key used in groupBy.
Depends on your dataset.
On Sat, Jul 11, 2015 at 5:06 AM, Umesh Kacha umesh.ka...@gmail.com wrote:
Hi Sriknath thanks much it worked when I set
Hi,
I've finally fixed this. The problem was that I wasn't providing a type for
the DStream in ssc.actorStream
/* with this inputDStream : ReceiverInputDStream[Nothing] and we get
SparkDriverExecutionException: Execution error
* Caused by: java.lang.ArrayStoreException: [Ljava.lang.Object;
Hi Roman,
Yes, Spark SQL will be a better solution than standard RDBMS databases for
querying 4-6 TB data. You can pair Spark SQL with HDFS+Parquet to build a
powerful analytics solution.
Mohammed
From: David Mitchell [mailto:jdavidmitch...@gmail.com]
Sent: Saturday, July 11, 2015 7:10 AM
To:
Thanks a lot oubrik,
I got your point, my consideration is that sum() should be already a
built-in function for iterators in python.
Anyway I tried your approach
def mysum(iter):
count = sum = 0
for item in iter:
count += 1
sum += item
return sum
wordCountsGrouped =
Note that if you use multi-part upload, each part becomes 1 block, which
allows for multiple concurrent readers. One would typically use fixed-size
block sizes which align with Spark's default HDFS block size (64 MB, I
think) to ensure the reads are aligned.
On Sat, Jul 11, 2015 at 11:14 AM,
Looks like reduceByKey() should work here.
Cheers
k/
On Sat, Jul 11, 2015 at 11:02 AM, leonida.gianfagna
leonida.gianfa...@gmail.com wrote:
Thanks a lot oubrik,
I got your point, my consideration is that sum() should be already a
built-in function for iterators in python.
Anyway I tried
Hi All,
I am facing this issue in my production environment.
My worker dies by throwing this exception.
But i see the space is available on all the partitions on my disk
I did NOT see any abrupt increase in DIsk IO, which might have choked the
executor to write on to the stderr file.
But still
Honestly you are addressing this wrongly - you do not seem.to have a
business case for changing - so why do you want to switch
Le sam. 11 juil. 2015 à 3:28, Mohammed Guller moham...@glassbeam.com a
écrit :
Hi Ravi,
First, Neither Spark nor Spark SQL is a database. Both are compute
engines,
Le sam. 11 juil. 2015 à 14:53, Roman Sokolov ole...@gmail.com a écrit :
Hello. Had the same question. What if I need to store 4-6 Tb and do
queries? Can't find any clue in documentation.
Am 11.07.2015 03:28 schrieb Mohammed Guller moham...@glassbeam.com:
Hi Ravi,
First, Neither Spark nor
15 matches
Mail list logo