[GitHub] spark pull request #20134: [SPARK-22613][SQL] Make UNCACHE TABLE behaviour c...

2018-10-18 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/20134


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20947: [SPARK-23705][SQL]Handle non-distinct columns in ...

2018-10-18 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/20947


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22171: [SPARK-25177][SQL] When dataframe decimal type co...

2018-10-17 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/22171


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

2018-09-06 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/22171
  
@viirya , Current issue occurs only in the case of 0 values, none zero 
values with higher scale are still save in non scientific notation. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

2018-09-06 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/22171
  
@gatorsmile @HyukjinKwon @viirya , I rechecked the customer scenario. 
It seems dataframe is saved as csv file and then netezza loads the csv data 
into netezza table. In csv output 0 values, with higher scale than 6,  are 
store in scientific notation and due to this  
[limitation](http://www-01.ibm.com/support/docview.wss?crawler=1=swg21570795)
 of netezza ,it fails to load the data. if the 0 values in csv is in non 
scientific notation, netezza  loads the data. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22307: [SPARK-25301][SQL] When a view uses an UDF from a non de...

2018-09-06 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/22307
  
@HyukjinKwon , I'll close this PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22307: [SPARK-25301][SQL] When a view uses an UDF from a...

2018-09-06 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/22307


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

2018-09-04 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/22171
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22307: [SPARK-25301][SQL] When a view uses an UDF from a non de...

2018-09-02 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/22307
  
@HyukjinKwon , even with this 
```create function d100.udf100 as 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper'; ``` we can simulate 
this issue.
I've updated PR description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22307: [SPARK-25301][SQL] When a view uses an UDF from a...

2018-08-31 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/22307

[SPARK-25301][SQL] When a view uses an UDF from a non default database, 
Spark analyser throws AnalysisException

## What changes were proposed in this pull request?
When a hive view uses an UDF from a non default database, Spark analyser 
throws AnalysisException

Steps to simulate this issue
-
In Hive

1) CREATE DATABASE d100;
2) ADD JAR /usr/udf/masking.jar // masking.jar has a custom udf class 
'com.uzx.udf.Masking'
3) create function d100.udf100 as "com.uzx.udf.Masking"; // Note: udf100 is 
created in d100
4) create view d100.v100 as select d100.udf100(name)  from default.emp; 
// Note : table default.emp has two columns 'nanme', 'address', 
5) select * from d100.v100; // query on view d100.v100 gives correct result

In Spark
-
1) spark.sql("select * from d100.v100").show
throws 
```
org.apache.spark.sql.AnalysisException: Undefined function: 'd100.udf100'. 
This function is neither a registered temporary function nor a permanent 
function registered in the database 'default'
```

This is because, while parsing the SQL statement of the View 
``` 'select `d100.udf100`(`emp`.`name`) from `default`.`emp`' ``` , spark 
parser fails to split database name and udf name and hence Spark function 
registry tries to load the UDF 'd100.udf100' from 'default' database.

To solve this issue, before creating 'FunctionIdentifier' ,  try to  get  
actual database name and then create FunctionIdentifier using  that database 
name and function name 

## How was this patch tested?
Added 1 unit test 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_fix_view_with_udf_issue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22307.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22307


commit 60cc1c9c66dade490dc0501622f8ac6b554b7ff4
Author: Vinod KC 
Date:   2018-08-31T16:57:00Z

fix issue with non default udf in hive view




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22171: [SPARK-25177][SQL] When dataframe decimal type co...

2018-08-23 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/22171#discussion_r212251852
  
--- Diff: sql/core/src/test/resources/sql-tests/results/literals.sql.out ---
@@ -197,7 +197,7 @@ select .e3
 -- !query 20
 select 1E309, -1E309
 -- !query 20 schema
-struct<1E+309:decimal(1,-309),-1E+309:decimal(1,-309)>

+struct<10:decimal(1,-309),-10:decimal(1,-309)>
--- End diff --

@viirya This schema is auto generated. 
  Actual issue is only with 0 value when scale higher than 6. If we need to 
reduce the scope of impact, can we add this condition?
```
override def toString: String = if (decimalVal == 0 && _scale > 6) {
toBigDecimal.bigDecimal.toPlainString()
  } else {
toBigDecimal.toString()
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22171: [SPARK-25177][SQL] When dataframe decimal type co...

2018-08-23 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/22171#discussion_r212243480
  
--- Diff: sql/core/src/test/resources/sql-tests/results/literals.sql.out ---
@@ -197,7 +197,7 @@ select .e3
 -- !query 20
 select 1E309, -1E309
 -- !query 20 schema
-struct<1E+309:decimal(1,-309),-1E+309:decimal(1,-309)>

+struct<10:decimal(1,-309),-10:decimal(1,-309)>
--- End diff --

 Result In Postgresql,  
```
CREATE TABLE TestdecBig (a DECIMAL(10,7), b DECIMAL(10,6), c DECIMAL(10,8), 
d DECIMAL(310,309));
INSERT INTO TestdecBig VALUES (1,1,1,1);
INSERT INTO TestdecBig VALUES (0,0,0,0);

Output
--
select * from TestdecBig;
 a |b | c  |
   
 d  
   
   

---+--++---

---
---
 1.000 | 1.00 | 1. | 
1.

000
00
 0.000 | 0.00 | 0. | 
0.

000
00
(2 rows)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22171: [SPARK-25177][SQL] When dataframe decimal type co...

2018-08-23 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/22171#discussion_r212243149
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/higher-order-functions.sql.out ---
@@ -201,6 +201,7 @@ struct<>
 -- !query 20 output
 
 
+
--- End diff --

Golden file generator automatically added this new line


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

2018-08-22 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/22171
  
@viirya, This issue is not only related to Dataset.show but also in dataset 
write operations. External databases like netezza   fails save the result  due 
to  scientific notation on "0" values .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22171: [SPARK-25177][SQL] When dataframe decimal type co...

2018-08-21 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/22171

[SPARK-25177][SQL] When dataframe decimal type column having scale higher 
than 6, 0 values are shown in scientific notation

## What changes were proposed in this pull request?
If scale of decimal type is > 6 , 0 value will be shown in scientific 
notation and hence, when the dataframe output is saved to external database, it 
fails due to scientific notation on "0" values.
In java.math.BigDecimal,  if the scale is >6 , 0 will be show in scientific 
notation.

In Postgrasql, 0 decimal value will be shown with non-scientific notation 
(plain string), this PR make spark SQL result consistent with Postgrsql.
## How was this patch tested?
Added 2 unit tests 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_fix_precision_zero

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22171.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22171


commit 1ebeae518f44439af7ceff2ce5fb80caf44f1d45
Author: Vinod KC 
Date:   2018-08-21T15:10:47Z

Fix precision issue with zero when decimal type scale > 6




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22130: [SPARK-25137][Spark Shell] NumberFormatException` when s...

2018-08-17 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/22130
  
@dongjoon-hyun , Thanks for taking a look at this PR, I've added Mac OS 
version in the PR description,
IMO, an update of ncurses is causing this issue 
Reference :  https://github.com/jline/jline2/issues/281


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22130: [SPARK-25137][Spark Shell] NumberFormatException`...

2018-08-16 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/22130

[SPARK-25137][Spark Shell] NumberFormatException` when starting spark-shell 
from Mac terminal

## What changes were proposed in this pull request?

 When starting spark-shell from Mac terminal,  Getting exception
[ERROR] Failed to construct terminal; falling back to unsupported
java.lang.NumberFormatException: For input string: "0x100"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.valueOf(Integer.java:766)
at jline.internal.InfoCmp.parseInfoCmp(InfoCmp.java:59)
at jline.UnixTerminal.parseInfoCmp(UnixTerminal.java:242)
at jline.UnixTerminal.(UnixTerminal.java:65)
at jline.UnixTerminal.(UnixTerminal.java:50)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at jline.TerminalFactory.getFlavor(TerminalFactory.java:211)

This issue is due a jline defect : 
https://github.com/jline/jline2/issues/281, which is fixed in Jline 2.14.4, 
bumping up JLine version in spark to version  > Jline 2.14.4 will fix the issue

## How was this patch tested?
No new  UT/automation test added,  after upgrade to latest Jline version 
2.14.6, manually tested spark shell features



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_UpgradeJLineVersion

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22130.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22130


commit d00929f28b2523869252d67fefc04297aadc5af6
Author: Vinod KC 
Date:   2018-08-17T04:10:18Z

Upgrade JLine to 2.14.6




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-04-12 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/20611#discussion_r180993462
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -370,26 +339,35 @@ case class LoadDataCommand(
 throw new AnalysisException(
   s"LOAD DATA: URI scheme is required for non-local input 
paths: '$path'")
   }
-
   // Follow Hive's behavior:
   // If LOCAL is not specified, and the path is relative,
   // then the path is interpreted relative to "/user/"
   val uriPath = uri.getPath()
   val absolutePath = if (uriPath != null && 
uriPath.startsWith("/")) {
 uriPath
   } else {
-s"/user/${System.getProperty("user.name")}/$uriPath"
+s"/user/${ System.getProperty("user.name") }/$uriPath"
   }
   new URI(scheme, authority, absolutePath, uri.getQuery(), 
uri.getFragment())
 }
-val hadoopConf = sparkSession.sessionState.newHadoopConf()
-val srcPath = new Path(hdfsUri)
-val fs = srcPath.getFileSystem(hadoopConf)
-if (!fs.exists(srcPath)) {
-  throw new AnalysisException(s"LOAD DATA input path does not 
exist: $path")
-}
-hdfsUri
   }
+}
+val srcPath = new Path(loadPath)
+val fs = 
srcPath.getFileSystem(sparkSession.sessionState.newHadoopConf())
+// This handling is because while reoslving the invalid urls starting 
with file:///
+// system throws IllegalArgumentException from globStatus api,so 
inorder to handle
+// such scenarios this code is added in try catch block and after 
catching the
+// run time exception a generic error will be displayed to the user.
+try {
+  if (null == fs.globStatus(srcPath) || 
fs.globStatus(srcPath).isEmpty) {
+throw new AnalysisException(s"LOAD DATA input path does not exist: 
$path")
+  }
+}
+catch {
+  case e: Exception =>
--- End diff --

Avoid catching generic exception, catch IllegalArgumentException


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-04-12 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/20611#discussion_r180993068
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -370,26 +339,35 @@ case class LoadDataCommand(
 throw new AnalysisException(
   s"LOAD DATA: URI scheme is required for non-local input 
paths: '$path'")
   }
-
   // Follow Hive's behavior:
   // If LOCAL is not specified, and the path is relative,
   // then the path is interpreted relative to "/user/"
   val uriPath = uri.getPath()
   val absolutePath = if (uriPath != null && 
uriPath.startsWith("/")) {
 uriPath
   } else {
-s"/user/${System.getProperty("user.name")}/$uriPath"
+s"/user/${ System.getProperty("user.name") }/$uriPath"
--- End diff --

nit: Please remove space


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20947: [SPARK-23705][SQL]Handle non-distinct columns in ...

2018-03-30 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/20947

[SPARK-23705][SQL]Handle non-distinct columns in DataSet.groupBy

## What changes were proposed in this pull request?

If input columns to DataSet.groupBy contains non unique columns, remove 
those columns

## How was this patch tested?
Added unit test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_FIX_SPARK-23705

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20947.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20947


commit eb93f0590f47227a16055f7eea6bd1e906dec3c9
Author: vinodkc <vinod.kc.in@...>
Date:   2018-03-30T15:53:49Z

Handle non-distinct columns in groupBy




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20917: [SPARK-23705][SQL]Handle non-distinct columns in ...

2018-03-30 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/20917


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20917: [SPARK-23705][SQL]Handle non-distinct columns in ...

2018-03-28 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/20917

[SPARK-23705][SQL]Handle non-distinct columns in DataSet.groupBy

## What changes were proposed in this pull request?
If input columns to DataSet.groupBy contains non unique columns, remove 
those columns

## How was this patch tested?
Added unit test



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_FIX_SPARK-23705

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20917.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20917






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20134: [SPARK-22613][SQL] Make UNCACHE TABLE behaviour c...

2018-01-02 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/20134

[SPARK-22613][SQL] Make UNCACHE TABLE behaviour consistent with CACHE TABLE

## What changes were proposed in this pull request?
Added an option LAZY for UNCACHE TABLE.
Eg: UNCACHE LAZY TABLE tableName
This will uncache the table lazily instead of blocking until all blocks are 
deleted.
## How was this patch tested?
Added Test cases 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_fix_SPARK-22613

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20134.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20134


commit ecc9352b2888bc25fa28273abf72b86bb8688350
Author: vinodkc <vinod.kc.in@...>
Date:   2018-01-02T12:49:24Z

Support UNCACHE LAZY TABLE




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19809: [SPARK-17920][SQL] [FOLLOWUP] Backport PR 19779 to branc...

2017-11-24 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/19809
  
Thank you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19809: [SPARK-17920][SQL] [FOLLOWUP] Backport PR 19779 t...

2017-11-24 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/19809


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19809: [SPARK-17920][SQL] [FOLLOWUP] Backport PR 19779 t...

2017-11-24 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19809#discussion_r152922206
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -700,12 +700,7 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
 
 test(s"$version: SPARK-17920: Insert into/overwrite avro table") {
--- End diff --

In master, this testcase uses an avro file & decimal schema from 
[https://github.com/apache/spark/pull/19003], but in branch-2.2  that  PR 19003 
is not backported. So I had to change the testcase in branch-2.2 to avoid using 
 schema & data added by PR 19003.
If we are fine to  backport  the PR 19003 in branch-2.2, same testcase in 
master can be used in branch-2.2 too. Please give your suggestions.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19809: [SPARK-17920][SQL] [FOLLOWUP] Backport PR 19779 to branc...

2017-11-23 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/19809
  
ping @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19809: [SPARK-17920][SQL] [FOLLOWUP] Backport PR 19779 t...

2017-11-23 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/19809

[SPARK-17920][SQL] [FOLLOWUP] Backport PR 19779 to branch-2.2

## What changes were proposed in this pull request?

A followup of 

> https://github.com/apache/spark/pull/19795

 , to simplify the file creation.

## How was this patch tested?

Only test case is updated


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark 
br_FollowupSPARK-17920_branch-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19809.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19809


commit 9cd03d38500f04d8d1ebf8771e79b1ba82d1f79b
Author: vinodkc <vinod.kc...@gmail.com>
Date:   2017-11-24T05:59:30Z

simplify the schema file creation in test




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19795: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Backport PR...

2017-11-22 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/19795
  
Thank you


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19795: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Back...

2017-11-22 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/19795


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19795: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Back...

2017-11-22 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/19795

[SPARK-17920][SPARK-19580][SPARK-19878][SQL] Backport PR 19779 to 
branch-2.2 - Support writing to Hive table which uses Avro schema url 
'avro.schema.url'

## What changes were proposed in this pull request?

> Backport https://github.com/apache/spark/pull/19779 to branch-2.2

SPARK-19580 Support for avro.schema.url while writing to hive table
SPARK-19878 Add hive configuration when initialize hive serde in 
InsertIntoHiveTable.scala
SPARK-17920 HiveWriterContainer passes null configuration to 
serde.initialize, causing NullPointerException in AvroSerde when using 
avro.schema.url

Support writing to Hive table which uses Avro schema url 'avro.schema.url'
For ex:
create external table avro_in (a string) stored as avro location 
'/avro-in/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');

create external table avro_out (a string) stored as avro location 
'/avro-out/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');

insert overwrite table avro_out select * from avro_in; // fails with 
java.lang.NullPointerException

WARN AvroSerDe: Encountered exception determining schema. Returning signal 
schema to indicate problem
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:182)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:174)
## Changes proposed in this fix
Currently 'null' value is passed to serializer, which causes NPE during 
insert operation, instead pass Hadoop configuration object
## How was this patch tested?
Added new test case in VersionsSuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_Fix_SPARK-17920_branch-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19795.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19795


commit 63e40e866e8ad3307b91ea430c29938a0050e6f7
Author: vinodkc <vinod.kc...@gmail.com>
Date:   2017-11-22T12:47:47Z

pass hadoop Configuration to serializer




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support wri...

2017-11-21 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/19779
  
@gatorsmile , @cloud-fan and @dongjoon-hyun 
Thanks for the review comments and guidence
Sure, I'll submit a separate PR for backporting it to 2.2


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152474528
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite avro table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaUrl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+val schemaFile = new File(schemaPath, "avroDecimal.avsc")
+val writer = new PrintWriter(schemaFile)
+writer.write(avroSchema)
+writer.close()
+
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE EXTERNAL TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+  versionSpark.sql(
+s"""INSERT OVERWRITE TABLE $destTableName SELECT * FROM 
$srcTableName""".stripMargin)
+  val result = versionSpark.table(srcTableName).collect()
+  assert(versionSpark.table(destTableName).collect() === result)
+  versionSpark.sql(
+s"""INSERT INTO TABLE $destTableName SELECT * FROM 
$srcTableName""".stripMargin)
--- End diff --

Updated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152473900
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite avro table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaUrl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+val schemaFile = new File(schemaPath, "avroDecimal.avsc")
+val writer = new PrintWriter(schemaFile)
+writer.write(avroSchema)
+writer.close()
+
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE EXTERNAL TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+  versionSpark.sql(
+s"""INSERT OVERWRITE TABLE $destTableName SELECT * FROM 
$srcTableName""".stripMargin)
--- End diff --

Sure, I'll update it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152473845
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite avro table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaUrl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+val schemaFile = new File(schemaPath, "avroDecimal.avsc")
+val writer = new PrintWriter(schemaFile)
+writer.write(avroSchema)
+writer.close()
+
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE EXTERNAL TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaUrl')
+   """.stripMargin
+  )
+  versionSpark.sql(
+s"""INSERT OVERWRITE TABLE $destTableName SELECT * FROM 
$srcTableName""".stripMargin)
--- End diff --

@gatorsmile , I tried to remove 'stripMargin', but getting 
org.apache.spark.sql.catalyst.parser.ParseException: extraneous input '|' 
expecting {'(', 'SELECT', 'FROM', 'ADD',..}


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152464029
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite external avro 
table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
--- End diff --

Thanks, I've updated the test case to test only managed tables and avoided 
creating a temp directory.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152446382
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite external avro 
table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
--- End diff --

@cloud-fan , This bug is for both external and managed tables.
 I've added a new test case for managed table too. However, to avoid code 
duplication, should I include both test inside same test method?. Please 
suggest.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152349156
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,75 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: SPARK-17920: Insert into/overwrite external avro 
table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
--- End diff --

Will change to 'CREATE EXTERNAL TABLE'


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152286208
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -800,7 +800,7 @@ class VersionsSuite extends SparkFunSuite with Logging {
   }
 }
 
-test(s"$version: read avro file containing decimal") {
+test(s"$version: SPARK-17920: read avro file containing decimal") {
--- End diff --

@cloud-fan , Yes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-20 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r152067290
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -841,6 +841,76 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
   }
 }
 
+test(s"$version: Insert into/overwrite external avro table") {
+  withTempDir { dir =>
+val path = dir.getAbsolutePath
+val schemaPath = s"""$path${File.separator}avroschemadir"""
+
+new File(schemaPath).mkdir()
+val avroSchema =
+  """{
+|  "name": "test_record",
+|  "type": "record",
+|  "fields": [ {
+|"name": "f0",
+|"type": [
+|  "null",
+|  {
+|"precision": 38,
+|"scale": 2,
+|"type": "bytes",
+|"logicalType": "decimal"
+|  }
+|]
+|  } ]
+|}
+  """.stripMargin
+val schemaurl = s"""$schemaPath${File.separator}avroDecimal.avsc"""
+new java.io.PrintWriter(schemaurl) { write(avroSchema); close() }
+val url = 
Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
+val srcLocation = new File(url.getFile)
+val destTableName = "tab1"
+val srcTableName = "tab2"
+
+withTable(srcTableName, destTableName) {
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $srcTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$srcLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  val destLocation = s"""$path${File.separator}destTableLocation"""
+  new File(destLocation).mkdir()
+
+  versionSpark.sql(
+s"""
+   |CREATE TABLE $destTableName
+   |ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+   |WITH SERDEPROPERTIES ('respectSparkSchema' = 'true')
+   |STORED AS
+   |  INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+   |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+   |LOCATION '$destLocation'
+   |TBLPROPERTIES ('avro.schema.url' = '$schemaurl')
+   """.stripMargin
+  )
+  versionSpark.sql(
+s"""insert overwrite table $destTableName select * from 
$srcTableName""".stripMargin)
--- End diff --

Thank you for your review comments, I'll fix all your comments 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-19 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19779#discussion_r151902033
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
 ---
@@ -89,6 +90,8 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc)
 val fileSinkConfSer = fileSinkConf
 new OutputWriterFactory {
   private val jobConf = new SerializableJobConf(new JobConf(conf))
+  private val broadcastHadoopConf = 
sparkSession.sparkContext.broadcast(
--- End diff --

Thanks for the comment, I'll change code to use jobConf


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19779: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Supp...

2017-11-18 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/19779

[SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support writing to Hive table 
which uses Avro schema url 'avro.schema.url'

## What changes were proposed in this pull request?
Support writing to Hive table which uses Avro schema url 'avro.schema.url'
For ex: 
create external table avro_in (a string) stored as avro location 
'/avro-in/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');

create external table avro_out (a string) stored as avro location 
'/avro-out/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');

 insert overwrite table avro_out select * from avro_in;  // fails with 
java.lang.NullPointerException

 WARN AvroSerDe: Encountered exception determining schema. Returning signal 
schema to indicate problem
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:182)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:174)

## Changes proposed in this fix
Currently 'null' value is passed to serializer, which causes NPE during 
insert operation, instead pass Hadoop configuration object
## How was this patch tested?
Added new test case in VersionsSuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_Fix_SPARK-17920

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19779.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19779


commit 034b2466d073c008b71eae072ee98353df56cbf2
Author: vinodkc <vinod.kc...@gmail.com>
Date:   2017-11-18T07:52:59Z

pass hadoopConfiguration to Serializer




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19008: [SPARK-21756][SQL]Add JSON option to allow unquoted cont...

2017-08-23 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/19008
  
@rxin , 
Sure, I'll update it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19008: [SPARK-21756][SQL]Add JSON option to allow unquot...

2017-08-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19008#discussion_r134227248
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala
 ---
@@ -72,6 +72,21 @@ class JsonParsingOptionsSuite extends QueryTest with 
SharedSQLContext {
 assert(df.first().getString(0) == "Reynold Xin")
   }
 
+  test("allowUnquotedControlChars off") {
+val str = """{"name" : " + "a\tb"}"""
--- End diff --

Updated the the test case. Thanks for the review comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19008: [SPARK-21756][SQL]Add JSON option to allow unquot...

2017-08-21 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/19008#discussion_r134227247
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala
 ---
@@ -72,6 +72,21 @@ class JsonParsingOptionsSuite extends QueryTest with 
SharedSQLContext {
 assert(df.first().getString(0) == "Reynold Xin")
   }
 
+  test("allowUnquotedControlChars off") {
+val str = """{"name" : " + "a\tb"}"""
--- End diff --

Update the the test case. Thanks for the review comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19008: [SPARK-21756][SQL]Add JSON option to allow unquot...

2017-08-20 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/19008

[SPARK-21756][SQL]Add JSON option to allow unquoted control characters

## What changes were proposed in this pull request?

This patch adds allowUnquotedControlChars option in JSON data source to 
allow JSON Strings to contain unquoted control characters (ASCII characters 
with value less than 32, including tab and line feed characters)

## How was this patch tested?
Add new test cases



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_fix_SPARK-21756

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19008.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19008


commit 6f009579687e11f34b26bb2f21883377b88f5b35
Author: vinodkc <vinod.kc...@gmail.com>
Date:   2017-08-20T17:39:27Z

Add JSON option to allow unquoted control characters




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19007: [SPARK-21783][SQL]Turn on ORC filter push-down by...

2017-08-20 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/19007


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19007: [SPARK-21783][SQL]Turn on ORC filter push-down by defaul...

2017-08-20 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/19007
  
Ok , I'm closing my PR.
Now a days, Spark  JIRA is not showing PR status.That is why I missed your 
PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19007: [SPARK-21783][SQL]Turn on ORC filter push-down by...

2017-08-20 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/19007

[SPARK-21783][SQL]Turn on ORC filter push-down by default

## What changes were proposed in this pull request?

Turned on ORC filter push-down option by default, 
spark.sql.orc.filterPushdown was turned off by default from the beginning. 

## How was this patch tested?
Updated existing test cases



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_Fix_SPARK-21783

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19007.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19007


commit 8756a541f2bdb3c37211d17452f55a930955416d
Author: vinodkc <vinod.kc...@gmail.com>
Date:   2017-08-20T15:10:13Z

Turn on ORC filter push-down by default




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18880: [SPARK-21665][Core]Need to close resources after use

2017-08-09 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/18880
  
retest this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18880: [SPARK-21665][Core]Need to close resources after ...

2017-08-08 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/18880

[SPARK-21665][Core]Need to close resources after use

## What changes were proposed in this pull request?
Resources in Core - SparkSubmitArguments.scala, Spark-launcher - 
AbstractCommandBuilder.java, resource-managers- YARN - Client.scala are 
released 

## How was this patch tested?
No new test cases added, Unit test have been passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_fixresouceleak

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18880.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18880


commit ca6ca5b3a30eb82fdf83188746255cd12c118646
Author: vinodkc <vinod.kc...@gmail.com>
Date:   2017-08-08T03:45:19Z

Release resources




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18852: [SPARK-21588][SQL] SQLContext.getConf(key, null) ...

2017-08-05 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/18852#discussion_r131519921
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -808,6 +808,12 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   Row("1"))
   }
 
+  test("SPARK-21588 SQLContext.getConf(key, null) should return null") {
--- End diff --

Thanks for the review comment.
Moved the test to SQLConfSuite


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18852: [SPARK-21588][SQL] SQLContext.getConf(key, null) ...

2017-08-05 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/18852

[SPARK-21588][SQL] SQLContext.getConf(key, null) should return null

## What changes were proposed in this pull request?

In SQLContext.get(key,null) for a key that is not defined in the conf, and 
doesn't have a default value defined, throws a NPE. In happens only when conf 
has a value converter

Added null check on defaultValue inside SQLConf.getConfString to avoid 
calling entry.valueConverter(defaultValue) 

## How was this patch tested?
Added unit test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_Fix_SPARK-21588

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18852.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18852


commit 53b73ed21a028ea1915df8a9a03d6d9368e53c8a
Author: vinodkc <vinod.kc...@gmail.com>
Date:   2017-08-05T06:39:04Z

SQLContext.getConf(key, null) should return null




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18771: [SPARK-21478][SQL] Avoid unpersisting related Dat...

2017-07-30 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/18771


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18771: [SPARK-21478][SQL] Avoid unpersisting related Dat...

2017-07-30 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/18771#discussion_r130234632
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -114,13 +114,13 @@ class CacheManager extends Logging {
   }
 
   /**
-   * Un-cache all the cache entries that refer to the given plan.
+   * Un-cache the cache entry that refers to the given plan.
*/
   def uncacheQuery(spark: SparkSession, plan: LogicalPlan, blocking: 
Boolean): Unit = writeLock {
 val it = cachedData.iterator()
 while (it.hasNext) {
   val cd = it.next()
-  if (cd.plan.find(_.sameResult(plan)).isDefined) {
+  if (plan.sameResult(cd.plan)) {
--- End diff --

@gatorsmile Thanks for the update.
I'll close this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18771: [SPARK-21478][SQL] Avoid unpersisting related Dat...

2017-07-29 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/18771

[SPARK-21478][SQL]  Avoid unpersisting related Datasets

## What changes were proposed in this pull request?
While unpersisting a dataset, only unpersist and remove that datasets's 
plan from Cachemanager's cachedData.
## How was this patch tested?
 Added unit tests




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark br_SPARK-21478

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18771.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18771


commit e187aeb67a13493c6f5a9e540779a677d3502b04
Author: vinodkc <vinod.kc...@gmail.com>
Date:   2017-07-29T15:10:24Z

Fixed unpersisting related DFs

commit 857b3dd5804355331d3bef4eb8136e6604232758
Author: vinodkc <vinod.kc...@gmail.com>
Date:   2017-07-29T18:01:01Z

Updated test cases and condition for unpersist




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10631][Documentation, MLlib, PySpark]Ad...

2015-09-19 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/8834

[SPARK-10631][Documentation, MLlib, PySpark]Added documentation for few APIs

There are some missing API docs in pyspark.mllib.linalg.Vector (including 
DenseVector and SparseVector). We should add them based on their Scala 
counterparts.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark fix_SPARK-10631

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8834.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8834


commit 3ee70548760c28bfc96e17a42450ac6f356de923
Author: vinodkc <vinod.kc...@gmail.com>
Date:   2015-09-19T06:00:26Z

Added documentation for few APIs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10516][ MLlib]Added values property in ...

2015-09-16 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/8682#issuecomment-140645936
  
Sure I'll work on SPARK-10631
Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10575][Spark Core]Wrapped RDD.takeSampl...

2015-09-13 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/8730#issuecomment-139956635
  
Can we retest?, it is failing in unrelated module


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10575][Spark Core]Wrapped RDD.takeSampl...

2015-09-13 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/8730#issuecomment-139957047
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10575][Spark Core]Wrapped RDD.takeSampl...

2015-09-12 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/8730

[SPARK-10575][Spark Core]Wrapped RDD.takeSample with Scope



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark fix_takesample_return

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8730.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8730


commit cf2c4665ed695a72818ca5c674f4c791013d4f2e
Author: vinodkc <vinod.kc...@gmail.com>
Date:   2015-09-12T07:56:19Z

wrapped in scope




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10016][ML]Stored broadcast var in a pri...

2015-09-12 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/8233


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10516][ MLlib]Added values property in ...

2015-09-09 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/8682

[SPARK-10516][ MLlib]Added values property in DenseVector



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark fix_SPARK-10516

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8682.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8682


commit b8e1e6e6dbf213bb16db29d0cae632f3c4e60e92
Author: Vinod K C <vinod...@huawei.com>
Date:   2015-09-10T07:26:21Z

Added values property




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10199][SPARK-10200][SPARK-10201][SPARK-...

2015-09-08 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/8507


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10199][SPARK-10200][SPARK-10201][SPARK-...

2015-09-08 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/8507#issuecomment-138789894
  
Closing this PR 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10468][ MLlib ]Verify schema before Dat...

2015-09-06 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/8636

[SPARK-10468][ MLlib ]Verify schema before Dataframe select API call

Loader.checkSchema was called to verify the schema after 
dataframe.select(...).
Schema verification should be done before dataframe.select(...)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark 
fix_GaussianMixtureModel_load_verification

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8636.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8636


commit cc99ea99756b72c39041cb95c11628a78eaa1457
Author: Vinod K C <vinod...@huawei.com>
Date:   2015-09-07T07:02:23Z

Changed schema verification order




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10414][MLlib]Fix hashcode issues in MLL...

2015-09-06 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/8565#issuecomment-138167561
  
Closing this PR,  as there is already an existing JIRA & PR on same issue


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10414][MLlib]Fix hashcode issues in MLL...

2015-09-06 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/8565


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10199][SPARK-10200][SPARK-10201][SPARK-...

2015-09-04 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/8507#issuecomment-137686626
  
@feynmanliang , I've removed case classes used for schema inference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10199][SPARK-10200][SPARK-10201][SPARK-...

2015-09-04 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/8507#discussion_r38733907
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala ---
@@ -125,16 +126,19 @@ object KMeansModel extends Loader[KMeansModel] {
 
 def save(sc: SparkContext, model: KMeansModel, path: String): Unit = {
   val sqlContext = new SQLContext(sc)
-  import sqlContext.implicits._
   val metadata = compact(render(
 ("class" -> thisClassName) ~ ("version" -> thisFormatVersion) ~ 
("k" -> model.k)))
   sc.parallelize(Seq(metadata), 
1).saveAsTextFile(Loader.metadataPath(path))
-  val dataRDD = sc.parallelize(model.clusterCenters.zipWithIndex).map 
{ case (point, id) =>
-Cluster(id, point)
--- End diff --

Removed case classes except NodeData,SplitData,PredictData, these classes 
simplify data extraction


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10414][MLlib]Fix hashcode issues in MLL...

2015-09-02 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/8565#discussion_r38507995
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -278,7 +278,8 @@ class DenseMatrix @Since("1.3.0") (
   }
 
   override def hashCode: Int = {
-com.google.common.base.Objects.hashCode(numRows : Integer, numCols: 
Integer, toArray)
+val state = Seq(numRows, numCols, Arrays.hashCode(values), 
isTransposed.hashCode)
--- End diff --

I tried with 'toBreeze.hashCode', but hashCode of BM[Double] is not 
consistent.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10414][MLlib]Fix hashcode issues in MLL...

2015-09-02 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/8565

[SPARK-10414][MLlib]Fix hashcode issues in MLLib

Added/Updated  hashcode  methods in 3 classes

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark fix_ML_hashcode_issue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8565.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8565


commit a4d256b37b0e4c8acdc9d9bca43588cd90cf1bcd
Author: Vinod K C <vinod...@huawei.com>
Date:   2015-09-02T07:01:32Z

Added hashcode method

commit 7878c42d568d4c08a64717e66176b0ecf7ee27ab
Author: Vinod K C <vinod...@huawei.com>
Date:   2015-09-02T07:09:31Z

Removed blank lines

commit 28737427ff16c2e4d84540361968c9edef3a79dc
Author: Vinod K C <vinod...@huawei.com>
Date:   2015-09-02T09:22:58Z

Updated testcase




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10199][SPARK-10200][SPARK-10201][SPARK-...

2015-09-02 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/8507#issuecomment-137312813
  
@feynmanliang , fixed your review comments. Could you please verify it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10430][SPARK-10430]Added hashCode metho...

2015-09-02 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/8581

[SPARK-10430][SPARK-10430]Added hashCode methods in AccumulableInfo and 
RDDOperationScope



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark fix_RDDOperationScope_Hashcode

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8581.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8581


commit 25c0f7b471356b829d3611972741b4a40982cc2f
Author: Vinod K C <vinod...@huawei.com>
Date:   2015-09-03T07:03:40Z

Added hashCode methods




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10414][MLlib]Fix hashcode issues in MLL...

2015-09-02 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/8565#discussion_r38609166
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -278,7 +278,8 @@ class DenseMatrix @Since("1.3.0") (
   }
 
   override def hashCode: Int = {
-com.google.common.base.Objects.hashCode(numRows : Integer, numCols: 
Integer, toArray)
+val state = Seq(numRows, numCols, Arrays.hashCode(values), 
isTransposed.hashCode)
--- End diff --

Got confirmation from Breeze mailing list that it is an issue and suggested 
to me to raise an issue in Breeze => 
https://github.com/scalanlp/breeze/issues/440
Can you please suggest whether we need to wait for Breeze issue fix or 
change equals method of CSCMatrix & DenseMatrix to match my  hashCode 
implementation?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10414][MLlib]Fix hashcode issues in MLL...

2015-09-02 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/8565#discussion_r38522537
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -278,7 +278,8 @@ class DenseMatrix @Since("1.3.0") (
   }
 
   override def hashCode: Int = {
-com.google.common.base.Objects.hashCode(numRows : Integer, numCols: 
Integer, toArray)
+val state = Seq(numRows, numCols, Arrays.hashCode(values), 
isTransposed.hashCode)
--- End diff --

I've asked on breeze mailing list and waiting for their reply.

Also I checked breeze CSCMatrix '==' method implementation , it seems '==' 
uses  the input data we passed from SparseMatrix and just hashCode is not 
implemented. I think this will guarantee  that my hashCode implementation will 
be consistent with Breeze '=='.
All MLlib equals hashCode test cases are passing now




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10414][MLlib]Fix hashcode issues in MLL...

2015-09-02 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/8565#discussion_r38514784
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -278,7 +278,8 @@ class DenseMatrix @Since("1.3.0") (
   }
 
   override def hashCode: Int = {
-com.google.common.base.Objects.hashCode(numRows : Integer, numCols: 
Integer, toArray)
+val state = Seq(numRows, numCols, Arrays.hashCode(values), 
isTransposed.hashCode)
--- End diff --

To verify that, I did  this test
'val bm1 =  new BSM[Double](values, numRows, numCols, colPtrs, rowIndices)
val bm2 = new BSM[Double](values, numRows, numCols, colPtrs, rowIndices)

if(bm1 == bm2){
  println("[==] passed")
  if( bm1.hashCode == bm2.hashCode){
println(" [hashCode] passed")
  }
  else {
println(" [hashCode] failed")
  }
}
else{
  println(" [==] failed")
}'

Got this output
[==] passed
 [hashCode] failed
On further investigation, I noticed , this thirdparty class has implemented 
only 'equals' method, not 'hashCode'
In that case, shall I change 'toBreeze == m.toBreeze'  comparison in equals?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10199][MLlib]Avoid using reflections fo...

2015-09-01 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/8507#discussion_r38410341
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala
 ---
@@ -159,19 +159,25 @@ object GaussianMixtureModel extends 
Loader[GaussianMixtureModel] {
 
   // Create Parquet data.
   val dataArray = Array.tabulate(weights.length) { i =>
-Data(weights(i), gaussians(i).mu, gaussians(i).sigma)
+Row(weights(i), gaussians(i).mu, gaussians(i).sigma)
   }
-  sc.parallelize(dataArray, 
1).toDF().write.parquet(Loader.dataPath(path))
+  val dataRDD: RDD[Row] = sc.parallelize(dataArray, 1)
+
+  sqlContext.createDataFrame(dataRDD, 
schema).write.parquet(Loader.dataPath(path))
 }
+private val schema = StructType(
+  StructField("weight", DoubleType, nullable = false)::
+  StructField("mu", new VectorUDT, nullable = false)::
+  StructField("sigma", new MatrixUDT, nullable = false)::Nil)
 
 def load(sc: SparkContext, path: String): GaussianMixtureModel = {
   val dataPath = Loader.dataPath(path)
   val sqlContext = new SQLContext(sc)
   val dataFrame = sqlContext.read.parquet(dataPath)
-  val dataArray = dataFrame.select("weight", "mu", "sigma").collect()
--- End diff --

Loader.checkSchema was called to verify the schema after 
dataframe.select(...). 
Schema verification should be done before  dataframe.select(...),  thats 
why I changed the order 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10199][MLlib]Avoid using reflections fo...

2015-08-28 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/8507

[SPARK-10199][MLlib]Avoid using reflections for parquet model save



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark fix_SPARK-10200

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8507.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8507


commit 233ebc8571d6a5b3284783a7b3a69f054f45389d
Author: Vinod K C vinod...@huawei.com
Date:   2015-08-28T13:28:23Z

Avoid using reflections for parquet model save

commit c6b2091b4f019407d2f35ec41038ea5fc925d1eb
Author: Vinod K C vinod...@huawei.com
Date:   2015-08-28T13:45:18Z

Removed blank lines

commit 6cc24458908c1858e1a8cb6628e6bffee12513ad
Author: Vinod K C vinod...@huawei.com
Date:   2015-08-28T14:04:14Z

Removed blank lines and orderer imports

commit 0dc71324e8e4d2ff70f7cc2183e61c10f4e4f6d8
Author: Vinod K C vinod...@huawei.com
Date:   2015-08-28T14:33:58Z

Reordered imports




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10016][ML]Stored broadcast var in a pri...

2015-08-24 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/8233#issuecomment-134471497
  
Sure, I'll update the code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10016][ML]Stored broadcast var in a pri...

2015-08-16 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/8233

[SPARK-10016][ML]Stored broadcast var in a private var



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark fix_SPARK-10016

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8233.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8233


commit 377f2a1b66d374971ece6c7efd5e92bea8cdf0e1
Author: vinodkc vinod.kc...@gmail.com
Date:   2015-08-16T18:09:55Z

Stored broadcast var in a private var




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8919][Documentation, MLlib]Added @since...

2015-07-28 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/7325#issuecomment-125820807
  
@mengxr , I've added  message in JIRA page from my JIRA account.
Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8761][Deploy]Added synchronization in r...

2015-07-14 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/7364


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8761][Deploy]Added synchronization in r...

2015-07-14 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/7364#issuecomment-121303647
  
Closing this incomplete PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8636][SQL] Fix equalNullSafe comparison

2015-07-13 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/7040


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8761][Deploy]Added synchronization in r...

2015-07-13 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/7364#discussion_r34534208
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -727,40 +727,44 @@ private[master] class Master(
 
   def removeApplication(app: ApplicationInfo, state: 
ApplicationState.Value) {
 if (apps.contains(app)) {
-  logInfo(Removing app  + app.id)
-  apps -= app
-  idToApp -= app.id
-  endpointToApp -= app.driver
-  addressToApp -= app.driver.address
-  if (completedApps.size = RETAINED_APPLICATIONS) {
-val toRemove = math.max(RETAINED_APPLICATIONS / 10, 1)
-completedApps.take(toRemove).foreach( a = {
-  appIdToUI.remove(a.id).foreach { ui = webUi.detachSparkUI(ui) }
-  applicationMetricsSystem.removeSource(a.appSource)
-})
-completedApps.trimStart(toRemove)
-  }
-  completedApps += app // Remember it in our history
-  waitingApps -= app
+  synchronized{
--- End diff --

I've added duplicate condition to avoid synchronization in case `app` 
doesn't exist in `apps`.if that is not required , I'll remove double checking.

Yes, `removeApplication` can be called concurrently with the same `app` 
instance from message loop in  Master.scala and also from handleAppKillRequest  
in  MasterPage.scala.
This fix is not for handling concurrent modification of 'app'
 





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8991][ML]Update SharedParamsCodeGen's G...

2015-07-13 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/7367

[SPARK-8991][ML]Update SharedParamsCodeGen's Generated Documentation

Removed private[ml] from Generated documentation

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark fix_sharedparmascodegen

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7367.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7367


commit 7e19025450c55ccbcef45dd8d460bc8dfdeef1c2
Author: Vinod K C vinod...@huawei.com
Date:   2015-07-13T10:27:33Z

Removed private[ml]

commit 4fa3c8f880f761d4d4113ac747d6c2688c8dfd3b
Author: Vinod K C vinod...@huawei.com
Date:   2015-07-13T10:45:59Z

Adding auto generated code




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8761][Deploy]Added synchronization in r...

2015-07-12 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/7364

[SPARK-8761][Deploy]Added synchronization in removeApplication

Master.removeApplication is not thread-safe. But it's called both in the 
message loop of Master and MasterPage.handleAppKillRequest which runs in 
threads of the Web server.

In order to avoid parallel calls from  message loop of Master and Web 
server, added synchronization and a condition to reconfirm existence of 
applicationInfo object  'app' in Set 'apps'

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark fix_sync_removeApplication

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7364.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7364


commit 7b04d8f1527868770f6bcab96537320596190933
Author: Vinod K C vinod...@huawei.com
Date:   2015-07-13T07:20:42Z

Added synchronization in removeApplication




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8919][Documentation, MLlib]Added @since...

2015-07-09 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/7325

[SPARK-8919][Documentation, MLlib]Added @since tags to mllib.recommendation



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark add_since_mllib.recommendation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7325.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7325


commit c41335072cf370a971e0f79eeb85ebfaa8e85ca7
Author: vinodkc vinod.kc...@gmail.com
Date:   2015-07-09T17:08:28Z

Added @since




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8636][SQL] Fix equalNullSafe comparison

2015-07-09 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/7040#discussion_r34328058
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionals.scala
 ---
@@ -296,9 +295,7 @@ case class CaseKeyWhen(key: Expression, branches: 
Seq[Expression]) extends CaseW
   }
 
   private def equalNullSafe(l: Any, r: Any) = {
--- End diff --

@marmbrus 
Shall I rename the function   private def equalNullSafe(l: Any, r: Any) 
to  private def equal(l: Any, r: Any) ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8787][SQL]Changed parameter order of @d...

2015-07-02 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/7183

[SPARK-8787][SQL]Changed  parameter order of @deprecated in package object 
sql



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark fix_deprecated_param_order

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7183.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7183


commit 700911c2efac05cc095edf2f95f4137e236d2fe2
Author: Vinod K C vinod...@huawei.com
Date:   2015-07-02T10:35:39Z

Changed order of parameters




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8787][SQL]Changed parameter order of @d...

2015-07-02 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/7183#discussion_r33763973
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/package.scala ---
@@ -46,6 +46,6 @@ package object sql {
* Type alias for [[DataFrame]]. Kept here for backward source 
compatibility for Scala.
* @deprecated As of 1.3.0, replaced by `DataFrame`.
*/
-  @deprecated(1.3.0, use DataFrame)
+  @deprecated(use DataFrame instead, 1.3.0)
--- End diff --

Updated the message to use DataFrame


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8787][SQL]Changed parameter order of @d...

2015-07-02 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/7183#issuecomment-117957848
  
Added description


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8628][SQL]Race condition in AbstractSpa...

2015-06-30 Thread vinodkc
Github user vinodkc commented on a diff in the pull request:

https://github.com/apache/spark/pull/7015#discussion_r33558094
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/AbstractSparkSQLParser.scala
 ---
@@ -30,12 +30,14 @@ private[sql] abstract class AbstractSparkSQLParser
 
   def parse(input: String): LogicalPlan = {
 // Initialize the Keywords.
-lexical.initialize(reservedWords)
+initLexical
 phrase(start)(new lexical.Scanner(input)) match {
   case Success(plan, _) = plan
   case failureOrError = sys.error(failureOrError.toString)
 }
   }
+  /* One time initialization of lexical.This avoid reinitialization of  
lexical in parse method */
+  protected lazy val initLexical: Unit = lexical.initialize(reservedWords)
--- End diff --

 'reservedWords' is generated using reflection. So during non-lazy 
'lexical' initialization in constructor, method call to collect  reservedWords 
will fail as the object construction is still not over.
Coud you suggest any other approach?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8628][SQL]Race condition in AbstractSpa...

2015-06-26 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/7015#issuecomment-115572040
  
Python test failure in unrelated module  mllib

File 
/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/classification.py,
 line 106, in __main__.LogisticRegressionModel
Failed example:
lrm = LogisticRegressionWithSGD.train(sc.parallelize(data), 
iterations=10)
Expected nothing
Got:

Exception happened during processing of request from ('127.0.0.1', 
53982)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8636][SQL] Fix equalNullSafe comparison

2015-06-26 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/7040#issuecomment-115654691
  
Ok, I'll update that test suite.

@smola , In predicates.scala,  eval method of  case class EqualNullSafe has 
similar if (l == null  r == null) check , shall I modify that condition too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8636][SQL] Fix equalNullSafe comparison

2015-06-26 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/7040

[SPARK-8636][SQL] Fix equalNullSafe comparison



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark fix_CaseKeyWhen_equalNullSafe

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7040.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7040


commit f2d0b53ebd0930f30f05fda2b00c9c0e835b3b79
Author: Vinod K C vinod...@huawei.com
Date:   2015-06-26T13:48:08Z

 Fix equalNullSafe comparison




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8628][SQL]Race condition in AbstractSpa...

2015-06-25 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/7015

[SPARK-8628][SQL]Race condition in AbstractSparkSQLParser.parse

Added synchronization in 'initialize' method

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark 
handle_lexical_initialize_schronization

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7015.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7015


commit e9fc49a3dab3086953b4b8d5bb5a679bbeab0038
Author: Vinod K C vinod...@huawei.com
Date:   2015-06-25T14:14:55Z

handle  synchronization in SqlLexical.initialize

commit ef4f60f20bfcbe3c9611b20a3456b671d8239a35
Author: Vinod K C vinod...@huawei.com
Date:   2015-06-25T14:27:21Z

Reverted import order




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8628][SQL]Race condition in AbstractSpa...

2015-06-25 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/7015#issuecomment-115492255
  
@smola ,I've updated the code.Can you please check whether that fix solves 
the issue ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7470] [SQL] Spark shell SQLContext cras...

2015-05-09 Thread vinodkc
Github user vinodkc commented on the pull request:

https://github.com/apache/spark/pull/5997#issuecomment-100585781
  
@marmbrus , PR https://github.com/apache/spark/pull/6013 handled it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >