[jira] [Comment Edited] (SPARK-22019) JavaBean int type property

2017-09-15 Thread Jen-Ming Chung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167734#comment-16167734
 ] 

Jen-Ming Chung edited comment on SPARK-22019 at 9/15/17 11:29 AM:
--

The alternative is giving the explicit schema instead inferring that you don't 
need to change your pojo class in above test case.

{code}
StructType schema = new StructType()
.add("id", IntegerType)
.add("str", StringType);
Dataset df = spark.read().schema(schema).json(stringdataset).as(
org.apache.spark.sql.Encoders.bean(SampleData.class));
{code}



was (Author: jmchung):
The alternative is giving the explicit schema instead inferring, means you 
don't need to change your pojo class.

{code}
StructType schema = new StructType()
.add("id", IntegerType)
.add("str", StringType);
Dataset df = spark.read().schema(schema).json(stringdataset).as(
org.apache.spark.sql.Encoders.bean(SampleData.class));
{code}


> JavaBean int type property 
> ---
>
> Key: SPARK-22019
> URL: https://issues.apache.org/jira/browse/SPARK-22019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: taiho choi
>
> when the type of SampleData's id is int, following code generates errors.
> when long, it's ok.
>  
> {code:java}
> @Test
> public void testDataSet2() {
> ArrayList arr= new ArrayList();
> arr.add("{\"str\": \"everyone\", \"id\": 1}");
> arr.add("{\"str\": \"Hello\", \"id\": 1}");
> //1.read array and change to string dataset.
> JavaRDD data = sc.parallelize(arr);
> Dataset stringdataset = sqc.createDataset(data.rdd(), 
> Encoders.STRING());
> stringdataset.show(); //PASS
> //2. convert string dataset to sampledata dataset
> Dataset df = 
> sqc.read().json(stringdataset).as(Encoders.bean(SampleData.class));
> df.show();//PASS
> df.printSchema();//PASS
> Dataset fad = df.flatMap(SampleDataFlat::flatMap, 
> Encoders.bean(SampleDataFlat.class));
> fad.show(); //ERROR
> fad.printSchema();
> }
> public static class SampleData implements Serializable {
> public String getStr() {
> return str;
> }
> public void setStr(String str) {
> this.str = str;
> }
> public int getId() {
> return id;
> }
> public void setId(int id) {
> this.id = id;
> }
> String str;
> int id;
> }
> public static class SampleDataFlat {
> String str;
> public String getStr() {
> return str;
> }
> public void setStr(String str) {
> this.str = str;
> }
> public SampleDataFlat(String str, long id) {
> this.str = str;
> }
> public static Iterator flatMap(SampleData data) {
> ArrayList arr = new ArrayList<>();
> arr.add(new SampleDataFlat(data.getStr(), data.getId()));
> arr.add(new SampleDataFlat(data.getStr(), data.getId()+1));
> arr.add(new SampleDataFlat(data.getStr(), data.getId()+2));
> return arr.iterator();
> }
> }
> {code}
> ==Error message==
> Caused by: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 38, Column 16: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 38, Column 16: No applicable constructor/method found for actual parameters 
> "long"; candidates are: "public void SparkUnitTest$SampleData.setId(int)"
> /* 024 */   public java.lang.Object apply(java.lang.Object _i) {
> /* 025 */ InternalRow i = (InternalRow) _i;
> /* 026 */
> /* 027 */ final SparkUnitTest$SampleData value1 = false ? null : new 
> SparkUnitTest$SampleData();
> /* 028 */ this.javaBean = value1;
> /* 029 */ if (!false) {
> /* 030 */
> /* 031 */
> /* 032 */   boolean isNull3 = i.isNullAt(0);
> /* 033 */   long value3 = isNull3 ? -1L : (i.getLong(0));
> /* 034 */
> /* 035 */   if (isNull3) {
> /* 036 */ throw new NullPointerException(((java.lang.String) 
> references[0]));
> /* 037 */   }
> /* 038 */   javaBean.setId(value3);



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22019) JavaBean int type property

2017-09-15 Thread Jen-Ming Chung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167734#comment-16167734
 ] 

Jen-Ming Chung edited comment on SPARK-22019 at 9/15/17 11:28 AM:
--

The alternative is giving the explicit schema instead inferring, means you 
don't need to change your pojo class.

{code}
StructType schema = new StructType()
.add("id", IntegerType)
.add("str", StringType);
Dataset df = spark.read().schema(schema).json(stringdataset).as(
org.apache.spark.sql.Encoders.bean(SampleData.class));
{code}



was (Author: jmchung):
The alternative is giving the explicit schema instead inferring, means you 
don't need to change your pojo class.

{code}
StructType schema = new StructType().add("id", IntegerType).add("str", 
StringType);
Dataset df = spark.read().schema(schema).json(stringdataset).as(
org.apache.spark.sql.Encoders.bean(SampleData.class));
{code}


> JavaBean int type property 
> ---
>
> Key: SPARK-22019
> URL: https://issues.apache.org/jira/browse/SPARK-22019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: taiho choi
>
> when the type of SampleData's id is int, following code generates errors.
> when long, it's ok.
>  
> {code:java}
> @Test
> public void testDataSet2() {
> ArrayList arr= new ArrayList();
> arr.add("{\"str\": \"everyone\", \"id\": 1}");
> arr.add("{\"str\": \"Hello\", \"id\": 1}");
> //1.read array and change to string dataset.
> JavaRDD data = sc.parallelize(arr);
> Dataset stringdataset = sqc.createDataset(data.rdd(), 
> Encoders.STRING());
> stringdataset.show(); //PASS
> //2. convert string dataset to sampledata dataset
> Dataset df = 
> sqc.read().json(stringdataset).as(Encoders.bean(SampleData.class));
> df.show();//PASS
> df.printSchema();//PASS
> Dataset fad = df.flatMap(SampleDataFlat::flatMap, 
> Encoders.bean(SampleDataFlat.class));
> fad.show(); //ERROR
> fad.printSchema();
> }
> public static class SampleData implements Serializable {
> public String getStr() {
> return str;
> }
> public void setStr(String str) {
> this.str = str;
> }
> public int getId() {
> return id;
> }
> public void setId(int id) {
> this.id = id;
> }
> String str;
> int id;
> }
> public static class SampleDataFlat {
> String str;
> public String getStr() {
> return str;
> }
> public void setStr(String str) {
> this.str = str;
> }
> public SampleDataFlat(String str, long id) {
> this.str = str;
> }
> public static Iterator flatMap(SampleData data) {
> ArrayList arr = new ArrayList<>();
> arr.add(new SampleDataFlat(data.getStr(), data.getId()));
> arr.add(new SampleDataFlat(data.getStr(), data.getId()+1));
> arr.add(new SampleDataFlat(data.getStr(), data.getId()+2));
> return arr.iterator();
> }
> }
> {code}
> ==Error message==
> Caused by: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 38, Column 16: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 38, Column 16: No applicable constructor/method found for actual parameters 
> "long"; candidates are: "public void SparkUnitTest$SampleData.setId(int)"
> /* 024 */   public java.lang.Object apply(java.lang.Object _i) {
> /* 025 */ InternalRow i = (InternalRow) _i;
> /* 026 */
> /* 027 */ final SparkUnitTest$SampleData value1 = false ? null : new 
> SparkUnitTest$SampleData();
> /* 028 */ this.javaBean = value1;
> /* 029 */ if (!false) {
> /* 030 */
> /* 031 */
> /* 032 */   boolean isNull3 = i.isNullAt(0);
> /* 033 */   long value3 = isNull3 ? -1L : (i.getLong(0));
> /* 034 */
> /* 035 */   if (isNull3) {
> /* 036 */ throw new NullPointerException(((java.lang.String) 
> references[0]));
> /* 037 */   }
> /* 038 */   javaBean.setId(value3);



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22019) JavaBean int type property

2017-09-15 Thread Jen-Ming Chung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167725#comment-16167725
 ] 

Jen-Ming Chung edited comment on SPARK-22019 at 9/15/17 11:18 AM:
--

Hi [~client.test],

The schema inferred after {{sqc.read().json(stringdataset)}} as below,
{code}
root
 |-- id: long (nullable = true)
 |-- str: string (nullable = true)
{code}

However, the pojo class {{SampleData.class}} the member {{id}} is declared as 
{{int}} instead of {{long}}, this will cause the subsequent exception in your 
test case.
So set the {{long}} type to {{id}} in {{SampleData.class}} then executing the 
test case again, you can expect the following results:
{code}
++
| str|
++
|everyone|
|everyone|
|everyone|
|   Hello|
|   Hello|
|   Hello|
++

root
 |-- str: string (nullable = true)
{code}


As you can see, we missing the {{id}} in schema, we need to add the {{id}} and 
corresponding getter and setter,
{code}
class SampleDataFlat {
...
long id;
public long getId() {
return id;
}

public void setId(long id) {
this.id = id;
}

public SampleDataFlat(String str, long id) {
this.str = str;
this.id = id;
}
...
}
{code}

Then you will get the following results:
{code}
+---++
| id| str|
+---++
|  1|everyone|
|  2|everyone|
|  3|everyone|
|  1|   Hello|
|  2|   Hello|
|  3|   Hello|
+---++

root
 |-- id: long (nullable = true)
 |-- str: string (nullable = true)
{code}


was (Author: jmchung):
Hi [~client.test],

The schema inferred after {{sqc.read().json(stringdataset)}} as below,
{code}
root
 |-- id: long (nullable = true)
 |-- str: string (nullable = true)
{code}

However, the pojo class {{SampleData.class}} the member {{id}} is declared as 
{{int}} instead of {{long}}, this will cause the subsequent exception in your 
test case.
So set the {{long}} type to {{id}} in {{SampleData.class}} then executing the 
test case again, you can expect the following results:
{code}
++
| str|
++
|everyone|
|everyone|
|everyone|
|   Hello|
|   Hello|
|   Hello|
++

root
 |-- str: string (nullable = true)
{code}


As you can see, we missing the {{id}} in schema, we need to add the {{id}} and 
corresponding getter and setter,
{code}
class SampleDataFlat {
long id;
public long getId() {
return id;
}

public void setId(long id) {
this.id = id;
}

public SampleDataFlat(String str, long id) {
this.str = str;
this.id = id;
}
}
{code}

Then you will get the following results:
{code}
+---++
| id| str|
+---++
|  1|everyone|
|  2|everyone|
|  3|everyone|
|  1|   Hello|
|  2|   Hello|
|  3|   Hello|
+---++

root
 |-- id: long (nullable = true)
 |-- str: string (nullable = true)
{code}

> JavaBean int type property 
> ---
>
> Key: SPARK-22019
> URL: https://issues.apache.org/jira/browse/SPARK-22019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: taiho choi
>
> when the type of SampleData's id is int, following code generates errors.
> when long, it's ok.
>  
> {code:java}
> @Test
> public void testDataSet2() {
> ArrayList arr= new ArrayList();
> arr.add("{\"str\": \"everyone\", \"id\": 1}");
> arr.add("{\"str\": \"Hello\", \"id\": 1}");
> //1.read array and change to string dataset.
> JavaRDD data = sc.parallelize(arr);
> Dataset stringdataset = sqc.createDataset(data.rdd(), 
> Encoders.STRING());
> stringdataset.show(); //PASS
> //2. convert string dataset to sampledata dataset
> Dataset df = 
> sqc.read().json(stringdataset).as(Encoders.bean(SampleData.class));
> df.show();//PASS
> df.printSchema();//PASS
> Dataset fad = df.flatMap(SampleDataFlat::flatMap, 
> Encoders.bean(SampleDataFlat.class));
> fad.show(); //ERROR
> fad.printSchema();
> }
> public static class SampleData implements Serializable {
> public String getStr() {
> return str;
> }
> public void setStr(String str) {
> this.str = str;
> }
> public int getId() {
> return id;
> }
> public void setId(int id) {
> this.id = id;
> }
> String str;
> int id;
> }
> public static class SampleDataFlat {
> String str;
> public String getStr() {
> return str;
> }
> public void setStr(String str) {
> this.str = str;
> }
> public SampleDataFlat(String str, long id) {
> this.str = str;
> }
> public static Iterator flatMap(SampleData data) {
>