Hi there,
I wanted to ask whether or not anyone has successfully used Jython with the
pyspark library. I wasn't sure if the C extension support was needed for
pyspark itself or was just a bonus of using Cython.
There was a claim (
Hi,
I ran the same version of a program with two different types of input
containing equivalent information.
Program 1: 10,000 files with on average 50 IDs, one every line
Program 2: 1 file containing 10,000 lines. On average 50 IDs per line
My program takes the input, creates key/value pairs
Hi Tom,
HDFS and Spark don't actually have a minimum block size -- so in that first
dataset, the files won't each be costing you 64 MB. However, the main reason
for difference in performance here is probably the number of RDD partitions. In
the first case, Spark will create an RDD with 1
Hi Cody,
I wasn't aware there were different versions of the parquet format. What's
the difference between raw parquet and the Hive-written parquet files?
As for your migration question, the approaches I've often seen are
convert-on-read and convert-all-at-once. Apache Cassandra for example
PySpark doesn't attempt to support Jython at present. IMO while it might be a
bit faster, it would lose a lot of the benefits of Python, which are the very
strong data processing libraries (NumPy, SciPy, Pandas, etc). So I'm not sure
it's worth supporting unless someone demonstrates a really
Hi Cody,
Assuming you are talking about 'safe' changes to the schema (i.e. existing
column names are never reused with incompatible types), this is something
I'd love to support. Perhaps you can describe more what sorts of changes
you are making, and if simple merging of the schemas would be
Found this thread from April..
http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3ccabjxkq6b7sfaxie4+aqtcmd8jsqbznsxsfw6v5o0wwwouob...@mail.gmail.com%3E
Wondering what the status of this.. We are thinking about implementing
these algorithms.. Would be a waste if they are already
Hello,
I have submitted a pull request (Adding support of initial value for state
update. #2665), please review and let me know.
Excited to submit my first pull request.
-Soumitra.
- Original Message -
From: Soumitra Kumar kumar.soumi...@gmail.com
To: dev@spark.apache.org
Sent: