Re: BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

2015-04-18 Thread Olivier Girardot
Hi Nipun,
you're right, I created the pull request fixing the documentation:
https://github.com/apache/spark/pull/5569
and the corresponding issue:
https://issues.apache.org/jira/browse/SPARK-6992
Thank you for your time,

Olivier.

Le sam. 18 avr. 2015 à 01:11, Nipun Batra  a écrit :

> Hi Oliver
>
> Thank you for responding.
>
> I am able to find org.apache.spark.sql.Row in spark-catalyst_2.10-1.3.0,
> BUT it was not visible in API document yesterday (
> https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/package-frame.html).
> I am pretty sure.
>
> Also I think this document needs to be changed '
> https://spark.apache.org/docs/latest/sql-programming-guide.html'
>
> return Row.create(fields[0], fields[1].trim());
>
>
> needs to be replaced with RowFactory.create.
>
> Thanks again for your reponse.
>
> Thanks
> Nipun Batra
>
>
>
> On Fri, Apr 17, 2015 at 2:50 PM, Olivier Girardot 
> wrote:
>
>> Hi Nipun,
>> I'm sorry but I don't understand exactly what your problem is ?
>> Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL
>> dependency.
>> Is it a compilation problem ?
>> Are you trying to run a main method using the pom you've just described ?
>> or are you trying to spark-submit the jar ?
>> If you're trying to run a main method, the scope provided is not designed
>> for that and will make your program fail.
>>
>> Regards,
>>
>> Olivier.
>>
>> Le ven. 17 avr. 2015 à 21:52, Nipun Batra  a écrit :
>>
>>> Hi
>>>
>>> The example given in SQL document
>>> https://spark.apache.org/docs/latest/sql-programming-guide.html
>>>
>>> org.apache.spark.sql.Row Does not exist in Java API or atleast I was not
>>> able to find it.
>>>
>>> Build Info - Downloaded from spark website
>>>
>>> Dependency
>>> 
>>> org.apache.spark
>>> spark-sql_2.10
>>> 1.3.0
>>> provided
>>> 
>>>
>>> Code in documentation
>>>
>>> // Import factory methods provided by DataType.import
>>> org.apache.spark.sql.types.DataType;// Import StructType and
>>> StructFieldimport org.apache.spark.sql.types.StructType;import
>>> org.apache.spark.sql.types.StructField;// Import Row.import
>>> org.apache.spark.sql.Row;
>>> // sc is an existing JavaSparkContext.SQLContext sqlContext = new
>>> org.apache.spark.sql.SQLContext(sc);
>>> // Load a text file and convert each line to a
>>> JavaBean.JavaRDD people =
>>> sc.textFile("examples/src/main/resources/people.txt");
>>> // The schema is encoded in a stringString schemaString = "name age";
>>> // Generate the schema based on the string of schemaList
>>> fields = new ArrayList();for (String fieldName:
>>> schemaString.split(" ")) {
>>>   fields.add(DataType.createStructField(fieldName,
>>> DataType.StringType, true));}StructType schema =
>>> DataType.createStructType(fields);
>>> // Convert records of the RDD (people) to Rows.JavaRDD rowRDD =
>>> people.map(
>>>   new Function() {
>>> public Row call(String record) throws Exception {
>>>   String[] fields = record.split(",");
>>>   return Row.create(fields[0], fields[1].trim());
>>> }
>>>   });
>>> // Apply the schema to the RDD.DataFrame peopleDataFrame =
>>> sqlContext.createDataFrame(rowRDD, schema);
>>> // Register the DataFrame as a
>>> table.peopleDataFrame.registerTempTable("people");
>>> // SQL can be run over RDDs that have been registered as
>>> tables.DataFrame results = sqlContext.sql("SELECT name FROM people");
>>> // The results of SQL queries are DataFrames and support all the
>>> normal RDD operations.// The columns of a row in the result can be
>>> accessed by ordinal.List names = results.map(new Function>> String>() {
>>>   public String call(Row row) {
>>> return "Name: " + row.getString(0);
>>>   }
>>>
>>> }).collect();
>>>
>>>
>>> Thanks
>>> Nipun
>>>
>>
>


Re: BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

2015-04-17 Thread Olivier Girardot
Hi Nipun,
I'm sorry but I don't understand exactly what your problem is ?
Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL
dependency.
Is it a compilation problem ?
Are you trying to run a main method using the pom you've just described ?
or are you trying to spark-submit the jar ?
If you're trying to run a main method, the scope provided is not designed
for that and will make your program fail.

Regards,

Olivier.

Le ven. 17 avr. 2015 à 21:52, Nipun Batra  a écrit :

> Hi
>
> The example given in SQL document
> https://spark.apache.org/docs/latest/sql-programming-guide.html
>
> org.apache.spark.sql.Row Does not exist in Java API or atleast I was not
> able to find it.
>
> Build Info - Downloaded from spark website
>
> Dependency
> 
> org.apache.spark
> spark-sql_2.10
> 1.3.0
> provided
> 
>
> Code in documentation
>
> // Import factory methods provided by DataType.import
> org.apache.spark.sql.types.DataType;// Import StructType and
> StructFieldimport org.apache.spark.sql.types.StructType;import
> org.apache.spark.sql.types.StructField;// Import Row.import
> org.apache.spark.sql.Row;
> // sc is an existing JavaSparkContext.SQLContext sqlContext = new
> org.apache.spark.sql.SQLContext(sc);
> // Load a text file and convert each line to a
> JavaBean.JavaRDD people =
> sc.textFile("examples/src/main/resources/people.txt");
> // The schema is encoded in a stringString schemaString = "name age";
> // Generate the schema based on the string of schemaList
> fields = new ArrayList();for (String fieldName:
> schemaString.split(" ")) {
>   fields.add(DataType.createStructField(fieldName,
> DataType.StringType, true));}StructType schema =
> DataType.createStructType(fields);
> // Convert records of the RDD (people) to Rows.JavaRDD rowRDD =
> people.map(
>   new Function() {
> public Row call(String record) throws Exception {
>   String[] fields = record.split(",");
>   return Row.create(fields[0], fields[1].trim());
> }
>   });
> // Apply the schema to the RDD.DataFrame peopleDataFrame =
> sqlContext.createDataFrame(rowRDD, schema);
> // Register the DataFrame as a
> table.peopleDataFrame.registerTempTable("people");
> // SQL can be run over RDDs that have been registered as
> tables.DataFrame results = sqlContext.sql("SELECT name FROM people");
> // The results of SQL queries are DataFrames and support all the
> normal RDD operations.// The columns of a row in the result can be
> accessed by ordinal.List names = results.map(new Function String>() {
>   public String call(Row row) {
> return "Name: " + row.getString(0);
>   }
>
> }).collect();
>
>
> Thanks
> Nipun
>


BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

2015-04-17 Thread Nipun Batra
Hi

The example given in SQL document
https://spark.apache.org/docs/latest/sql-programming-guide.html

org.apache.spark.sql.Row Does not exist in Java API or atleast I was not
able to find it.

Build Info - Downloaded from spark website

Dependency

org.apache.spark
spark-sql_2.10
1.3.0
provided


Code in documentation

// Import factory methods provided by DataType.import
org.apache.spark.sql.types.DataType;// Import StructType and
StructFieldimport org.apache.spark.sql.types.StructType;import
org.apache.spark.sql.types.StructField;// Import Row.import
org.apache.spark.sql.Row;
// sc is an existing JavaSparkContext.SQLContext sqlContext = new
org.apache.spark.sql.SQLContext(sc);
// Load a text file and convert each line to a
JavaBean.JavaRDD people =
sc.textFile("examples/src/main/resources/people.txt");
// The schema is encoded in a stringString schemaString = "name age";
// Generate the schema based on the string of schemaList
fields = new ArrayList();for (String fieldName:
schemaString.split(" ")) {
  fields.add(DataType.createStructField(fieldName,
DataType.StringType, true));}StructType schema =
DataType.createStructType(fields);
// Convert records of the RDD (people) to Rows.JavaRDD rowRDD = people.map(
  new Function() {
public Row call(String record) throws Exception {
  String[] fields = record.split(",");
  return Row.create(fields[0], fields[1].trim());
}
  });
// Apply the schema to the RDD.DataFrame peopleDataFrame =
sqlContext.createDataFrame(rowRDD, schema);
// Register the DataFrame as a
table.peopleDataFrame.registerTempTable("people");
// SQL can be run over RDDs that have been registered as
tables.DataFrame results = sqlContext.sql("SELECT name FROM people");
// The results of SQL queries are DataFrames and support all the
normal RDD operations.// The columns of a row in the result can be
accessed by ordinal.List names = results.map(new Function() {
  public String call(Row row) {
return "Name: " + row.getString(0);
  }

}).collect();


Thanks
Nipun