Re: BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API
Hi Nipun, you're right, I created the pull request fixing the documentation: https://github.com/apache/spark/pull/5569 and the corresponding issue: https://issues.apache.org/jira/browse/SPARK-6992 Thank you for your time, Olivier. Le sam. 18 avr. 2015 à 01:11, Nipun Batra a écrit : > Hi Oliver > > Thank you for responding. > > I am able to find org.apache.spark.sql.Row in spark-catalyst_2.10-1.3.0, > BUT it was not visible in API document yesterday ( > https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/package-frame.html). > I am pretty sure. > > Also I think this document needs to be changed ' > https://spark.apache.org/docs/latest/sql-programming-guide.html' > > return Row.create(fields[0], fields[1].trim()); > > > needs to be replaced with RowFactory.create. > > Thanks again for your reponse. > > Thanks > Nipun Batra > > > > On Fri, Apr 17, 2015 at 2:50 PM, Olivier Girardot > wrote: > >> Hi Nipun, >> I'm sorry but I don't understand exactly what your problem is ? >> Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL >> dependency. >> Is it a compilation problem ? >> Are you trying to run a main method using the pom you've just described ? >> or are you trying to spark-submit the jar ? >> If you're trying to run a main method, the scope provided is not designed >> for that and will make your program fail. >> >> Regards, >> >> Olivier. >> >> Le ven. 17 avr. 2015 à 21:52, Nipun Batra a écrit : >> >>> Hi >>> >>> The example given in SQL document >>> https://spark.apache.org/docs/latest/sql-programming-guide.html >>> >>> org.apache.spark.sql.Row Does not exist in Java API or atleast I was not >>> able to find it. >>> >>> Build Info - Downloaded from spark website >>> >>> Dependency >>> >>> org.apache.spark >>> spark-sql_2.10 >>> 1.3.0 >>> provided >>> >>> >>> Code in documentation >>> >>> // Import factory methods provided by DataType.import >>> org.apache.spark.sql.types.DataType;// Import StructType and >>> StructFieldimport org.apache.spark.sql.types.StructType;import >>> org.apache.spark.sql.types.StructField;// Import Row.import >>> org.apache.spark.sql.Row; >>> // sc is an existing JavaSparkContext.SQLContext sqlContext = new >>> org.apache.spark.sql.SQLContext(sc); >>> // Load a text file and convert each line to a >>> JavaBean.JavaRDD people = >>> sc.textFile("examples/src/main/resources/people.txt"); >>> // The schema is encoded in a stringString schemaString = "name age"; >>> // Generate the schema based on the string of schemaList >>> fields = new ArrayList();for (String fieldName: >>> schemaString.split(" ")) { >>> fields.add(DataType.createStructField(fieldName, >>> DataType.StringType, true));}StructType schema = >>> DataType.createStructType(fields); >>> // Convert records of the RDD (people) to Rows.JavaRDD rowRDD = >>> people.map( >>> new Function() { >>> public Row call(String record) throws Exception { >>> String[] fields = record.split(","); >>> return Row.create(fields[0], fields[1].trim()); >>> } >>> }); >>> // Apply the schema to the RDD.DataFrame peopleDataFrame = >>> sqlContext.createDataFrame(rowRDD, schema); >>> // Register the DataFrame as a >>> table.peopleDataFrame.registerTempTable("people"); >>> // SQL can be run over RDDs that have been registered as >>> tables.DataFrame results = sqlContext.sql("SELECT name FROM people"); >>> // The results of SQL queries are DataFrames and support all the >>> normal RDD operations.// The columns of a row in the result can be >>> accessed by ordinal.List names = results.map(new Function>> String>() { >>> public String call(Row row) { >>> return "Name: " + row.getString(0); >>> } >>> >>> }).collect(); >>> >>> >>> Thanks >>> Nipun >>> >> >
Re: BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API
Hi Nipun, I'm sorry but I don't understand exactly what your problem is ? Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL dependency. Is it a compilation problem ? Are you trying to run a main method using the pom you've just described ? or are you trying to spark-submit the jar ? If you're trying to run a main method, the scope provided is not designed for that and will make your program fail. Regards, Olivier. Le ven. 17 avr. 2015 à 21:52, Nipun Batra a écrit : > Hi > > The example given in SQL document > https://spark.apache.org/docs/latest/sql-programming-guide.html > > org.apache.spark.sql.Row Does not exist in Java API or atleast I was not > able to find it. > > Build Info - Downloaded from spark website > > Dependency > > org.apache.spark > spark-sql_2.10 > 1.3.0 > provided > > > Code in documentation > > // Import factory methods provided by DataType.import > org.apache.spark.sql.types.DataType;// Import StructType and > StructFieldimport org.apache.spark.sql.types.StructType;import > org.apache.spark.sql.types.StructField;// Import Row.import > org.apache.spark.sql.Row; > // sc is an existing JavaSparkContext.SQLContext sqlContext = new > org.apache.spark.sql.SQLContext(sc); > // Load a text file and convert each line to a > JavaBean.JavaRDD people = > sc.textFile("examples/src/main/resources/people.txt"); > // The schema is encoded in a stringString schemaString = "name age"; > // Generate the schema based on the string of schemaList > fields = new ArrayList();for (String fieldName: > schemaString.split(" ")) { > fields.add(DataType.createStructField(fieldName, > DataType.StringType, true));}StructType schema = > DataType.createStructType(fields); > // Convert records of the RDD (people) to Rows.JavaRDD rowRDD = > people.map( > new Function() { > public Row call(String record) throws Exception { > String[] fields = record.split(","); > return Row.create(fields[0], fields[1].trim()); > } > }); > // Apply the schema to the RDD.DataFrame peopleDataFrame = > sqlContext.createDataFrame(rowRDD, schema); > // Register the DataFrame as a > table.peopleDataFrame.registerTempTable("people"); > // SQL can be run over RDDs that have been registered as > tables.DataFrame results = sqlContext.sql("SELECT name FROM people"); > // The results of SQL queries are DataFrames and support all the > normal RDD operations.// The columns of a row in the result can be > accessed by ordinal.List names = results.map(new Function String>() { > public String call(Row row) { > return "Name: " + row.getString(0); > } > > }).collect(); > > > Thanks > Nipun >
BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API
Hi The example given in SQL document https://spark.apache.org/docs/latest/sql-programming-guide.html org.apache.spark.sql.Row Does not exist in Java API or atleast I was not able to find it. Build Info - Downloaded from spark website Dependency org.apache.spark spark-sql_2.10 1.3.0 provided Code in documentation // Import factory methods provided by DataType.import org.apache.spark.sql.types.DataType;// Import StructType and StructFieldimport org.apache.spark.sql.types.StructType;import org.apache.spark.sql.types.StructField;// Import Row.import org.apache.spark.sql.Row; // sc is an existing JavaSparkContext.SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc); // Load a text file and convert each line to a JavaBean.JavaRDD people = sc.textFile("examples/src/main/resources/people.txt"); // The schema is encoded in a stringString schemaString = "name age"; // Generate the schema based on the string of schemaList fields = new ArrayList();for (String fieldName: schemaString.split(" ")) { fields.add(DataType.createStructField(fieldName, DataType.StringType, true));}StructType schema = DataType.createStructType(fields); // Convert records of the RDD (people) to Rows.JavaRDD rowRDD = people.map( new Function() { public Row call(String record) throws Exception { String[] fields = record.split(","); return Row.create(fields[0], fields[1].trim()); } }); // Apply the schema to the RDD.DataFrame peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema); // Register the DataFrame as a table.peopleDataFrame.registerTempTable("people"); // SQL can be run over RDDs that have been registered as tables.DataFrame results = sqlContext.sql("SELECT name FROM people"); // The results of SQL queries are DataFrames and support all the normal RDD operations.// The columns of a row in the result can be accessed by ordinal.List names = results.map(new Function() { public String call(Row row) { return "Name: " + row.getString(0); } }).collect(); Thanks Nipun