Reading non UTF-8 files via spark streaming
Hi, i am trying to read files which are ISO-8859-6 encoded via spark streaming, but the default encoding for " ssc.textFileStream " is UTF-8 , so i don't get the data properly , so is there a way change the default encoding for textFileStream , or a way to read the file's bytes then i can handle the encoding ? Thanks so much in advance -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Reading-non-UTF-8-files-via-spark-streaming-tp25397.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming on Yarn Input from Flume
have you fixed this issue ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-on-Yarn-Input-from-Flume-tp11755p22055.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Saving Dstream into a single file
i am doing word count example on flume stream and trying to save output as text files in HDFS , but in the save directory i got multiple sub directories each having files with small size , i wonder if there is a way to append in a large file instead of saving in multiple files , as i intend to save the output in hive hdfs directory so i can query the result using hive hope anyone have a workaround for this issue , Thanks in advance -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Saving-Dstream-into-a-single-file-tp22058.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: deploying Spark on standalone cluster
i was having a similar issue but it was in spark and flume integration i was getting failed to bind error , but got it fixed by shutting down firewall for both machines (make sure : service iptables status = firewall stopped) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/deploying-Spark-on-standalone-cluster-tp22049p22057.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Store DStreams into Hive using Hive Streaming
please if you have found a solution for this , could you please post it ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Store-DStreams-into-Hive-using-Hive-Streaming-tp18307p21877.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Store Spark data into hive table
I am trying to store my word count output into hive data warehouse my pipeline is: Flume streaming = spark do word count = store result in hive table for visualization later my code is : *import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import org.apache.spark._ import org.apache.spark.streaming._ import org.apache.spark.streaming.StreamingContext._ import org.apache.spark.storage.StorageLevel import org.apache.spark.streaming.flume._ import org.apache.spark.util.IntParam import org.apache.spark.sql._ import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.SQLContext object WordCount { def main(args: Array[String]) { if (args.length 2) { System.err.println( Usage: WordCount host port) System.exit(1) } val Array(host, port) = args val batchInterval = Milliseconds(2000) // Create the context and set the batch size val sparkConf = new SparkConf().setAppName(WordCount) val sc = new SparkContext(sparkConf) val ssc = new StreamingContext(sc, batchInterval) // Create a flume stream val stream = FlumeUtils.createStream(ssc, host, port.toInt) // Print out the count of events received from this server in each batch stream.count().map(cnt = Received !!!: + cnt + flume events. ).print() // it holds the string stream (converted event body array into string) val body = stream.map(e = new String(e.event.getBody.array)) val counts = body.flatMap(line = line.toLowerCase.replaceAll([^a-zA-Z0-9\\s], ).split(\\s+)) .map(word = (word, 1)) .reduceByKey(_ + _) // TESTING storing variable counts into hive :: val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) import hiveContext._ val good = createSchemaRDD(counts) good.saveAsTable(meta_test) ssc.start() ssc.awaitTermination() } }* this gives me error : *value createschemardd is not a member of org.apache.spark.sql.sqlcontext* so is there any way to fix this or other method to store data into hive data warehouse ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Store-Spark-data-into-hive-table-tp21865.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org