rdd.collect and
rdd.foreach(println)
Thanks
Best Regards
On Wed, Sep 17, 2014 at 12:26 PM, vasiliy lt;
zadonskiyd@
gt; wrote:
it also appears in streaming hdfs fileStream
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/collect-on-hadoopFile-RDD
it also appears in streaming hdfs fileStream
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/collect-on-hadoopFile-RDD-returns-wrong-results-tp14368p14425.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
full code example:
def main(args: Array[String]) {
val conf = new
SparkConf().setAppName(ErrorExample).setMaster(local[8])
.set(spark.serializer, classOf[KryoSerializer].getName)
val sc = new SparkContext(conf)
val rdd = sc.hadoopFile(
hdfs://./user.avro,
it works, thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Thrift-JDBC-server-deployment-for-production-tp13947p14345.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hello. I have a hadoopFile RDD and i tried to collect items to driver
program, but it returns me an array of identical records (equals to last
record of my file). My code is like this:
val rdd = sc.hadoopFile(
hdfs:///data.avro,
When you get a stream from sc.fileStream() spark will process only files with
file timestamp then current timestamp so all data from HDFS should not be
processed again. You may have a another problem - spark will not process
files that moved to your HDFS folder between your application restarts.