Hi Nipun, you're right, I created the pull request fixing the documentation: https://github.com/apache/spark/pull/5569 and the corresponding issue: https://issues.apache.org/jira/browse/SPARK-6992 Thank you for your time,
Olivier. Le sam. 18 avr. 2015 à 01:11, Nipun Batra <batrani...@gmail.com> a écrit : > Hi Oliver > > Thank you for responding. > > I am able to find org.apache.spark.sql.Row in spark-catalyst_2.10-1.3.0, > BUT it was not visible in API document yesterday ( > https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/package-frame.html). > I am pretty sure. > > Also I think this document needs to be changed ' > https://spark.apache.org/docs/latest/sql-programming-guide.html' > > return Row.create(fields[0], fields[1].trim()); > > > needs to be replaced with RowFactory.create. > > Thanks again for your reponse. > > Thanks > Nipun Batra > > > > On Fri, Apr 17, 2015 at 2:50 PM, Olivier Girardot <ssab...@gmail.com> > wrote: > >> Hi Nipun, >> I'm sorry but I don't understand exactly what your problem is ? >> Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL >> dependency. >> Is it a compilation problem ? >> Are you trying to run a main method using the pom you've just described ? >> or are you trying to spark-submit the jar ? >> If you're trying to run a main method, the scope provided is not designed >> for that and will make your program fail. >> >> Regards, >> >> Olivier. >> >> Le ven. 17 avr. 2015 à 21:52, Nipun Batra <bni...@gmail.com> a écrit : >> >>> Hi >>> >>> The example given in SQL document >>> https://spark.apache.org/docs/latest/sql-programming-guide.html >>> >>> org.apache.spark.sql.Row Does not exist in Java API or atleast I was not >>> able to find it. >>> >>> Build Info - Downloaded from spark website >>> >>> Dependency >>> <dependency> >>> <groupId>org.apache.spark</groupId> >>> <artifactId>spark-sql_2.10</artifactId> >>> <version>1.3.0</version> >>> <scope>provided</scope> >>> </dependency> >>> >>> Code in documentation >>> >>> // Import factory methods provided by DataType.import >>> org.apache.spark.sql.types.DataType;// Import StructType and >>> StructFieldimport org.apache.spark.sql.types.StructType;import >>> org.apache.spark.sql.types.StructField;// Import Row.import >>> org.apache.spark.sql.Row; >>> // sc is an existing JavaSparkContext.SQLContext sqlContext = new >>> org.apache.spark.sql.SQLContext(sc); >>> // Load a text file and convert each line to a >>> JavaBean.JavaRDD<String> people = >>> sc.textFile("examples/src/main/resources/people.txt"); >>> // The schema is encoded in a stringString schemaString = "name age"; >>> // Generate the schema based on the string of schemaList<StructField> >>> fields = new ArrayList<StructField>();for (String fieldName: >>> schemaString.split(" ")) { >>> fields.add(DataType.createStructField(fieldName, >>> DataType.StringType, true));}StructType schema = >>> DataType.createStructType(fields); >>> // Convert records of the RDD (people) to Rows.JavaRDD<Row> rowRDD = >>> people.map( >>> new Function<String, Row>() { >>> public Row call(String record) throws Exception { >>> String[] fields = record.split(","); >>> return Row.create(fields[0], fields[1].trim()); >>> } >>> }); >>> // Apply the schema to the RDD.DataFrame peopleDataFrame = >>> sqlContext.createDataFrame(rowRDD, schema); >>> // Register the DataFrame as a >>> table.peopleDataFrame.registerTempTable("people"); >>> // SQL can be run over RDDs that have been registered as >>> tables.DataFrame results = sqlContext.sql("SELECT name FROM people"); >>> // The results of SQL queries are DataFrames and support all the >>> normal RDD operations.// The columns of a row in the result can be >>> accessed by ordinal.List<String> names = results.map(new Function<Row, >>> String>() { >>> public String call(Row row) { >>> return "Name: " + row.getString(0); >>> } >>> >>> }).collect(); >>> >>> >>> Thanks >>> Nipun >>> >> >