://issues.apache.org/jira/browse/SPARK-4964
Can you elaborate on why you have to use SimpleConsumer in your
environment?
TD
On Wed, Feb 4, 2015 at 7:44 PM, Xuelin Cao [hidden email]
http:///user/SendEmail.jtp?type=nodenode=10477i=0 wrote:
Hi,
In our environment, Kafka can only
Hi,
In our environment, Kafka can only be used with simple consumer API,
like storm spout does.
And, also, I found there are suggestions that Kafka connector of
Spark should not be used in production
http://markmail.org/message/2lb776ta5sq6lgtw because it is based on the
high-level
In hadoop MR, there is an option *mapred.reduce.slowstart.completed.maps*
which can be used to start reducer stage when X% mappers are completed. By
doing this, the data shuffling process is able to parallel with the map
process.
In a large multi-tenancy cluster, this option is usually tuned
Hi,
Correct me if I were wrong. It looks like, the current version of
Spark-SQL is *tuple-at-a-time* module. Basically, each time the physical
operator produces a tuple by recursively call child-execute .
There are papers that illustrate the benefits of vectorized query
engine. And
Hi,
In Spark SQL help document, it says Some of these (such as indexes) are
less important due to Spark SQL’s in-memory computational model. Others are
slotted for future releases of Spark SQL.
- Block level bitmap indexes and virtual columns (used to build indexes)
For our
In our experimental cluster (1 driver, 5 workers), we tried the simplest
example: sc.parallelize(Range(0, 100), 2).count
In the event log, we found the executor takes too much time on deserialization,
about 300 ~ 500ms, and the execution time is only 1ms.
Our servers are with 2.3G Hz CPU
Thanks Imran,
The problems is, *every time* I run the same task, the deserialization
time is around 300~500ms. I don't know if this is a normal case.
--
View this message in context:
In our experimental cluster (1 driver, 5 workers), we tried the simplest
example: sc.parallelize(Range(0, 100), 2).count
In the event log, we found the executor takes too much time on
deserialization, about 300 ~ 500ms, and the execution time is only 1ms.
Our servers are with 2.3G Hz CPU * 24