Re: Subscription

2016-09-14 Thread Daniel Lopes
Hi Omkar, Look at this link http://spark.apache.org/community.html to subscribe to the right list. Best, *Daniel Lopes* Chief Data and Analytics Officer | OneMatch c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes www.onematch.com.br

Re: Spark SQL - Applying transformation on a struct inside an array

2016-09-14 Thread Fred Reiss
+1 to this request. I talked last week with a product group within IBM that is struggling with the same issue. It's pretty common in data cleaning applications for data in the early stages to have nested lists or sets inconsistent or incomplete schema information. Fred On Tue, Sep 13, 2016 at

Question about impersonation on Spark executor

2016-09-14 Thread Tao Li
Hi, I am new to Spark and would like to have a quick question about the end user impersonation on Spark executor process. Basically I am running SQL queries through Spark thrift server with doAs set to true to enable end user impersonation. In my experiment, I was able to start session for

Re: Test fails when compiling spark with tests

2016-09-14 Thread Fred Reiss
Also try doing a fresh clone of the git repository. I've seen some of those rare failure modes corrupt parts of my local copy in the past. FWIW the main branch as of yesterday afternoon is building fine in my environment. Fred On Tue, Sep 13, 2016 at 6:29 PM, Jakob Odersky

Re: Saving less data to improve Pregel performance in GraphX?

2016-09-14 Thread Reynold Xin
This is definitely useful, but in reality it might be very difficult to do. On Mon, Aug 29, 2016 at 6:46 PM, Fang Zhang wrote: > Dear developers, > > I am running some tests using Pregel API. > > It seems to me that more than 90% of the volume of a graph object is >

Not all KafkaReceivers processing the data Why?

2016-09-14 Thread Rachana Srivastava
Hello all, I have created a Kafka topic with 5 partitions. And I am using createStream receiver API like following. But somehow only one receiver is getting the input data. Rest of receivers are not processign anything. Can you please help? JavaPairDStream messages = null;

Re: Not all KafkaReceivers processing the data Why?

2016-09-14 Thread Jeremy Smith
Take a look at how the messages are actually distributed across the partitions. If the message keys have a low cardinality, you might get poor distribution (i.e. all the messages are actually only in two of the five partitions, leading to what you see in Spark). If you take a look at the Kafka

sqlContext.registerDataFrameAsTable is not working properly in pyspark 2.0

2016-09-14 Thread sririshindra
Hi, I have a production job that is registering four different dataframes as tables in pyspark 1.6.2 . when we upgraded to spark 2.0 only three of the four dataframes are getting registered. the fourth dataframe is not getting registered. There are no code changes whatsoever. The only change is

CSV Reader with row numbers

2016-09-14 Thread Akshay Sachdeva
Environment: Apache Spark 1.6.2 Scala: 2.10 I am currently using the spark-csv package courtesy of databricks and I would like to have a (pre processing ?) stage when reading the CSV file that also adds a row number to each row of data being read from the csv file. This will allow for better