Hi Omkar,
Look at this link http://spark.apache.org/community.html to subscribe to
the right list.
Best,
*Daniel Lopes*
Chief Data and Analytics Officer | OneMatch
c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
www.onematch.com.br
+1 to this request. I talked last week with a product group within IBM that
is struggling with the same issue. It's pretty common in data cleaning
applications for data in the early stages to have nested lists or sets
inconsistent or incomplete schema information.
Fred
On Tue, Sep 13, 2016 at
Hi,
I am new to Spark and would like to have a quick question about the end user
impersonation on Spark executor process.
Basically I am running SQL queries through Spark thrift server with doAs set to
true to enable end user impersonation. In my experiment, I was able to start
session for
Also try doing a fresh clone of the git repository. I've seen some of those
rare failure modes corrupt parts of my local copy in the past.
FWIW the main branch as of yesterday afternoon is building fine in my
environment.
Fred
On Tue, Sep 13, 2016 at 6:29 PM, Jakob Odersky
This is definitely useful, but in reality it might be very difficult to do.
On Mon, Aug 29, 2016 at 6:46 PM, Fang Zhang wrote:
> Dear developers,
>
> I am running some tests using Pregel API.
>
> It seems to me that more than 90% of the volume of a graph object is
>
Hello all,
I have created a Kafka topic with 5 partitions. And I am using createStream
receiver API like following. But somehow only one receiver is getting the
input data. Rest of receivers are not processign anything. Can you please help?
JavaPairDStream messages = null;
Take a look at how the messages are actually distributed across the
partitions. If the message keys have a low cardinality, you might get poor
distribution (i.e. all the messages are actually only in two of the five
partitions, leading to what you see in Spark).
If you take a look at the Kafka
Hi,
I have a production job that is registering four different dataframes as
tables in pyspark 1.6.2 . when we upgraded to spark 2.0 only three of the
four dataframes are getting registered. the fourth dataframe is not getting
registered. There are no code changes whatsoever. The only change is
Environment:
Apache Spark 1.6.2
Scala: 2.10
I am currently using the spark-csv package courtesy of databricks and I
would like to have a (pre processing ?) stage when reading the CSV file that
also adds a row number to each row of data being read from the csv file.
This will allow for better