Try this
https://github.com/RetailRocket/SparkMultiTool
<https://github.com/RetailRocket/SparkMultiTool>
This loader solved slow reading of a big data set of small files in hdfs.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/why-is-spark-scala-c
>>>> operations with breeze/blas, etc. i saw some improvements, but it's
>>>> still a
>>>> lot slower than my python code.
>>>>
>>>> why is that?
>>>>
>>>> how do you improve your spark
how do you improve your spark + scala performance today?
>>>
>>> or is spark + scala just not the right tool for small to medium datasets?
>>>
>>> when would you use spark + scala vs. python?
>>>
>>> thanks!
>>>
>>>
>
>>> why is that?
>>>
>>> how do you improve your spark + scala performance today?
>>>
>>> or is spark + scala just not the right tool for small to medium datasets?
>>>
>>> when would you
gt;> or is spark + scala just not the right tool for small to medium datasets?
>>
>> when would you use spark + scala vs. python?
>>
>> thanks!
>>
>>
>>
>> --
>> View this message in con
asets?
>
> when would you use spark + scala vs. python?
>
> thanks!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/why-is-spark-scala-code-so-slow-compared-
r spark + scala performance today?
or is spark + scala just not the right tool for small to medium datasets?
when would you use spark + scala vs. python?
thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/why-is-spark-scala-code-so-slow-compared-to-pyth