Reading non UTF-8 files via spark streaming

2015-11-16 Thread tarek_abouzeid
Hi,

i am trying to read files which are ISO-8859-6  encoded via spark streaming,
but the default encoding for 

" ssc.textFileStream " is UTF-8 , so i don't get the data properly , so is
there a way change the default encoding for textFileStream , or a way to
read the file's bytes then i can handle the encoding ? 

Thanks so much in advance  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Reading-non-UTF-8-files-via-spark-streaming-tp25397.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming on Yarn Input from Flume

2015-03-15 Thread tarek_abouzeid
have you fixed this issue ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-on-Yarn-Input-from-Flume-tp11755p22055.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Saving Dstream into a single file

2015-03-15 Thread tarek_abouzeid
i am doing word count example on flume stream and trying to save output as
text files in HDFS , but in the save directory i got multiple sub
directories each having files with small size , i wonder if there is a way
to append in a large file instead of saving in multiple files , as i intend
to save the output in hive hdfs directory so i can query the result using
hive 

hope anyone have a workaround for this issue , Thanks in advance 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Saving-Dstream-into-a-single-file-tp22058.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: deploying Spark on standalone cluster

2015-03-15 Thread tarek_abouzeid
i was having a similar issue but it was in spark and flume integration i was
getting failed to bind error , but got it fixed by shutting down firewall
for both machines (make sure : service iptables status = firewall stopped)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/deploying-Spark-on-standalone-cluster-tp22049p22057.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Store DStreams into Hive using Hive Streaming

2015-03-02 Thread tarek_abouzeid
please if you have found a solution for this , could you please post it ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Store-DStreams-into-Hive-using-Hive-Streaming-tp18307p21877.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Store Spark data into hive table

2015-03-01 Thread tarek_abouzeid
I am trying to store my word count output into hive data warehouse my
pipeline is:

Flume streaming = spark do word count = store result in hive table for
visualization later

my code is :

*import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.flume._
import org.apache.spark.util.IntParam
import org.apache.spark.sql._
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.SQLContext

object WordCount {
  def main(args: Array[String]) {
if (args.length  2) {
  System.err.println(
Usage: WordCount host port)
  System.exit(1)
}

val Array(host, port) = args

val batchInterval = Milliseconds(2000)

// Create the context and set the batch size
val sparkConf = new SparkConf().setAppName(WordCount)
val sc = new SparkContext(sparkConf)
val ssc = new StreamingContext(sc, batchInterval)

// Create a flume stream
val stream = FlumeUtils.createStream(ssc, host, port.toInt)

// Print out the count of events received from this server in each batch
stream.count().map(cnt = Received !!!: + cnt +  flume events.
).print()
  
// it holds the string stream (converted event body array into string)
val body = stream.map(e = new String(e.event.getBody.array))


val counts = body.flatMap(line =
line.toLowerCase.replaceAll([^a-zA-Z0-9\\s], ).split(\\s+))
 .map(word = (word, 1))
 .reduceByKey(_ + _)

// TESTING storing variable counts into hive ::

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) 
import hiveContext._ 
val good = createSchemaRDD(counts)
good.saveAsTable(meta_test)  

ssc.start()
ssc.awaitTermination()
  }
}*

this gives me error :  *value createschemardd is not a member of
org.apache.spark.sql.sqlcontext*

so is there any way to fix this or other method to store data into hive data
warehouse ?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Store-Spark-data-into-hive-table-tp21865.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org