Re: Is Spark suited for this use case?

2017-10-16 Thread Jörn Franke
Hi, What is the motivation behind your question? Save costs? You seem to be happy with the functional/non-functional requirements. So the only thing that it could be is cost or need for innovation in the future. Best regards > On 16. Oct 2017, at 06:32, van den Heever, Christian CC >

Re: Dependency error due to scala version mismatch in SBT and Spark 2.1

2017-10-16 Thread patel kumar
This is not the correct way to build Spark with sbt. Why ? On Sun, Oct 15, 2017 at 11:54 PM, Mark Hamstra wrote: > I am building Spark using build.sbt. > > > Which just gets me back to my original question: Why? This is not the > correct way to build Spark with sbt. >

Re: Dependency error due to scala version mismatch in SBT and Spark 2.1

2017-10-16 Thread Mark Hamstra
The canonical build of Spark is done using maven, not sbt. Maven and sbt do things a bit differently. In order to get maven and sbt to each build Spark quite similar to the way the other does, the builds are each driven through a customization script -- build/mvn and build/sbt respectively. A lot

Apache Spark GraphX: java.lang.ArrayIndexOutOfBoundsException: -1

2017-10-16 Thread Andy Long
We have hit a bug with GraphX when calling the connectedComponents function, where it errors with the following error java.lang.ArrayIndexOutOfBoundsException: -1 I've found this bug report: https://issues.apache.org/jira/browse/SPARK-5480 Has anyone else hit this issue and if so did how did you

Happy Diwali to those forum members who celebrate this great festival

2017-10-16 Thread Mich Talebzadeh
Hope you will have a great time. Regards, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaimer:*

The class SparkFlumeProtocol,EventBatch and SparkSinkEvent are not found

2017-10-16 Thread smith
I import the spark source to eclipse IDE.But I got a error as follows: The class SparkFlumeProtocol is not found. I want to know what should I do? -- Sent from:

Re: Happy Diwali to those forum members who celebrate this great festival

2017-10-16 Thread ayan guha
Thanks Mich. And Happy Festive season to the most vibrant and knowledgable user community around...I learn every day here and I can not express my gratitude to each one of you for that On Mon, Oct 16, 2017 at 8:37 PM, Mich Talebzadeh wrote: > Hope you will have a

task not serializable on simple operations

2017-10-16 Thread Imran Rajjad
Is there a way around to implement a separate Java class that implements serializable interface for even small petty arithmetic operations? below is code from simple decision tree example Double testMSE = predictionAndLabel.map(new Function, Double>() { @Override

Re: Generating StructType from dataframe.printSchema

2017-10-16 Thread Silvio Fiorito
If you’re confident the schema of all files is consistent, then just infer the schema from a single file and reuse it when loading the whole data set: val schema = spark.read.json(“/path/to/single/file.json”).schema val wholeDataSet = spark.read.schema(schema).json(“/path/to/whole/datasets”)

Generating StructType from dataframe.printSchema

2017-10-16 Thread Jeroen Miller
Hello Spark users, Does anyone know if there is a way to generate the Scala code for a complex structure just from the output of dataframe.printSchema? I have to analyse a significant volume of data and want to explicitly set the schema(s) to avoid having to read my (compressed) JSON files

Re: Generating StructType from dataframe.printSchema

2017-10-16 Thread Jeroen Miller
On 16 Oct 2017, at 16:22, Silvio Fiorito wrote: > [...] then just infer the schema from a single file and reuse it when loading > the whole data set: Well, that is a possibility indeed. Thanks, Jeroen

WARN: Truncated the string representation with df.describe()

2017-10-16 Thread Md. Rezaul Karim
Hi, When I try to see the statistics in a DataFrame using the df.describe() method, I am experiencing the following WARN and as a result, nothing is getting printed: 17/10/16 18:37:54 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted

Re: Spark Structured Streaming not connecting to Kafka using kerberos

2017-10-16 Thread Burak Yavuz
Hi Darshan, How are you creating your kafka stream? Can you please share the options you provide? spark.readStream.format("kafka") .option(...) // all these please .load() On Sat, Oct 14, 2017 at 1:55 AM, Darshan Pandya wrote: > Hello, > > I'm using Spark 2.1.0

Spark directory partition name

2017-10-16 Thread Mohit Anchlia
When spark writes the partition it writes in the format as: =/key value> Is there a way to have spark write only by keyvalue?

Re: Spark Structured Streaming not connecting to Kafka using kerberos

2017-10-16 Thread Darshan Pandya
HI Burak, Well turns out it worked fine when i submit in cluster mode. I also tried to convert my app in dstreams. In dstreams too it works well only when deployed in cluster mode. Here is how i configured the stream. val lines = spark.readStream .format("kafka")