[jira] [Updated] (SPARK-21519) Add an option to the JDBC data source to initialize the environment of the remote database session

2017-08-11 Thread Luca Canali (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-21519:

Description: 
This proposes an option to the JDBC datasource, tentatively called 
"sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements)
 . After each database session is opened to the remote DB, and before starting 
to read data, this option executes a custom SQL statement (or a PL/SQL block in 
the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell --jars ojdb6.jar

val df = spark.read
   .format("jdbc")
   .option("url", "jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name")
   .option("driver", "oracle.jdbc.driver.OracleDriver")
   .option("dbtable", "(select 1, sysdate, systimestamp, 
current_timestamp, localtimestamp from dual)")
   .option("user", "MYUSER")
   .option("password", "MYPASSWORD").option("fetchsize",1000)
   .option("sessionInitStatement", preambleSQL)
   .load()

df.show(5,false)
{code}

*Comments:* This proposal has been developed and tested for connecting the 
Spark JDBC data source to Oracle databases, however I believe it can be useful 
for other target DBs too, as it is quite generic.   
The code executed by the option "sessionInitStatement" is just the 
user-provided string fed through the execute method of the JDBC connection, so 
it can use the features of the target database language/syntax. When using 
sessionInitStatement for querying Oracle, for example, the user-provided 
command can be a SQL statement or a PL/SQL block grouping multiple commands and 
logic.   
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities of injecting user-provided SQL (and PL/SQL) that 
this opens.  

  was:
This proposes an option to the JDBC datasource, tentatively called 
"sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements)
 . After each database session is opened to the remote DB, and before starting 
to read data, this option executes a custom SQL statement (or a PL/SQL block in 
the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell --jars ojdb6.jar

val df = spark.read
   .format("jdbc")
   .option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name")
   .option("driver", "oracle.jdbc.driver.OracleDriver")
   .option("dbtable", "(select 1, sysdate, systimestamp, 
current_timestamp, localtimestamp from dual)")
   .option("user", "MYUSER")
   .option("password", "MYPASSWORD").option("fetchsize",1000)
   .option("sessionInitStatement", preambleSQL)
   .load()

df.show(5,false)
{code}

*Comments:* This proposal has been developed and tested for connecting the 
Spark JDBC data source to Oracle databases, however I believe it can be useful 
for other target DBs too, as it is quite generic.   
The code executed by the option "sessionInitStatement" is just the 
user-provided string fed through the execute method of the JDBC connection, so 
it can use the features of the target database language/syntax. When using 
sessionInitStatement for querying Oracle, for example, the user-provided 
command can be a SQL statement or a PL/SQL block grouping multiple commands and 
logic.   
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities of injecting user-provided SQL (and PL/SQL) that 
this opens.  


> Add an option to the JDBC data source to initialize the environment of the 
> remote database session
> --
>
> Key: SPARK-21519
> URL: https://issues.apache.org/jira/browse/SPARK-21519
> Project: Spark
>  Issue Type: New 

[jira] [Updated] (SPARK-21519) Add an option to the JDBC data source to initialize the environment of the remote database session

2017-08-11 Thread Luca Canali (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-21519:

Description: 
This proposes an option to the JDBC datasource, tentatively called 
"sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements)
 . After each database session is opened to the remote DB, and before starting 
to read data, this option executes a custom SQL statement (or a PL/SQL block in 
the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell --jars ojdb6.jar

val df = spark.read
   .format("jdbc")
   .option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name")
   .option("driver", "oracle.jdbc.driver.OracleDriver")
   .option("dbtable", "(select 1, sysdate, systimestamp, 
current_timestamp, localtimestamp from dual)")
   .option("user", "MYUSER")
   .option("password", "MYPASSWORD").option("fetchsize",1000)
   .option("sessionInitStatement", preambleSQL)
   .load()

df.show(5,false)
{code}

*Comments:* This proposal has been developed and tested for connecting the 
Spark JDBC data source to Oracle databases, however I believe it can be useful 
for other target DBs too, as it is quite generic.   
The code executed by the option "sessionInitStatement" is just the 
user-provided string fed through the execute method of the JDBC connection, so 
it can use the features of the target database language/syntax. When using 
sessionInitStatement for querying Oracle, for example, the user-provided 
command can be a SQL statement or a PL/SQL block grouping multiple commands and 
logic.   
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities of injecting user-provided SQL (and PL/SQL) that 
this opens.  

  was:
This proposes an option to the JDBC datasource, tentatively called 
"sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements)
 . After each database session is opened to the remote DB, and before starting 
to read data, this option executes a custom SQL statement (or a PL/SQL block in 
the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell --jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORD").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)
{code}

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however I believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities of injecting user-provided SQL (and PL/SQL) that 
this opens.



> Add an option to the JDBC data source to initialize the environment of the 
> remote database session
> --
>
> Key: SPARK-21519
> URL: https://issues.apache.org/jira/browse/SPARK-21519
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
> Fix For: 2.3.0
>
>
> This proposes an option to the JDBC datasource, tentatively called 
> "sessionInitStatement" to implement the functionality of session 
> initialization present for example in the Sqoop connector for Oracle (see 
> 

[jira] [Updated] (SPARK-21519) Add an option to the JDBC data source to initialize the environment of the remote database session

2017-08-11 Thread Luca Canali (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-21519:

Description: 
This proposes an option to the JDBC datasource, tentatively called 
"sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements)
 . After each database session is opened to the remote DB, and before starting 
to read data, this option executes a custom SQL statement (or a PL/SQL block in 
the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell --jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORD").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)
{code}

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however I believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities of injecting user-provided SQL (and PL/SQL) that 
this opens.


  was:
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORD").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)
{code}

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however I believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities of injecting user-provided SQL (and PL/SQL) that 
this opens.



> Add an option to the JDBC data source to initialize the environment of the 
> remote database session
> --
>
> Key: SPARK-21519
> URL: https://issues.apache.org/jira/browse/SPARK-21519
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
> Fix For: 2.3.0
>
>
> This proposes an option to the JDBC datasource, tentatively called 
> "sessionInitStatement" to implement the functionality of session 
> initialization present for example in the Sqoop connector for Oracle (see 
> https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements)
>  . After each database session is opened to the remote DB, and before 
> starting to read data, this option executes a custom SQL statement (or a 
> PL/SQL block in the case of Oracle).
> Example of usage, relevant to Oracle JDBC:
> {code}
> val preambleSQL="""
> begin 
>   execute immediate 'alter session set tracefile_identifier=sparkora'; 
>   execute immediate 'alter session set "_serial_direct_read"=true';
>   execute immediate 'alter session set 

[jira] [Updated] (SPARK-21519) Add an option to the JDBC data source to initialize the environment of the remote database session

2017-07-24 Thread Luca Canali (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-21519:

Description: 
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORD").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)
{code}

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however I believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities for injecting user-provided SQL (and PL/SQL) that 
this opens.


  was:
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORD").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)
{code}

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however it believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities for injecting user-provided SQL (and PL/SQL) that 
this opens.



> Add an option to the JDBC data source to initialize the environment of the 
> remote database session
> --
>
> Key: SPARK-21519
> URL: https://issues.apache.org/jira/browse/SPARK-21519
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Luca Canali
>Priority: Minor
>
> This proposes an option to the JDBC datasource, tentatively called " 
> sessionInitStatement" to implement the functionality of session 
> initialization present for example in the Sqoop connector for Oracle (see 
> https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
>  ) . After each database session is opened to the remote DB, and before 
> starting to read data, this option executes a custom SQL statement (or a 
> PL/SQL block in the case of Oracle).
> Example of usage, relevant to Oracle JDBC:
> {code}
> val preambleSQL="""
> begin 
>   execute immediate 'alter session set tracefile_identifier=sparkora'; 
>   execute immediate 'alter session set "_serial_direct_read"=true';
>   execute immediate 'alter session set time_zone=''+02:00''';
> end;
> """
> bin/spark-shell –jars ojdb6.jar
> 

[jira] [Updated] (SPARK-21519) Add an option to the JDBC data source to initialize the environment of the remote database session

2017-07-24 Thread Luca Canali (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-21519:

Description: 
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORD").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)
{code}

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however I believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities of injecting user-provided SQL (and PL/SQL) that 
this opens.


  was:
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORD").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)
{code}

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however I believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities for injecting user-provided SQL (and PL/SQL) that 
this opens.



> Add an option to the JDBC data source to initialize the environment of the 
> remote database session
> --
>
> Key: SPARK-21519
> URL: https://issues.apache.org/jira/browse/SPARK-21519
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Luca Canali
>Priority: Minor
>
> This proposes an option to the JDBC datasource, tentatively called " 
> sessionInitStatement" to implement the functionality of session 
> initialization present for example in the Sqoop connector for Oracle (see 
> https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
>  ) . After each database session is opened to the remote DB, and before 
> starting to read data, this option executes a custom SQL statement (or a 
> PL/SQL block in the case of Oracle).
> Example of usage, relevant to Oracle JDBC:
> {code}
> val preambleSQL="""
> begin 
>   execute immediate 'alter session set tracefile_identifier=sparkora'; 
>   execute immediate 'alter session set "_serial_direct_read"=true';
>   execute immediate 'alter session set time_zone=''+02:00''';
> end;
> """
> bin/spark-shell –jars ojdb6.jar
> val 

[jira] [Updated] (SPARK-21519) Add an option to the JDBC data source to initialize the environment of the remote database session

2017-07-24 Thread Luca Canali (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-21519:

Description: 
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORD").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)
{code}

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however it believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities for injecting user-provided SQL (and PL/SQL) that 
this opens.


  was:
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORK").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)
{code}

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however it believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities for injecting user-provided SQL (and PL/SQL) that 
this opens.



> Add an option to the JDBC data source to initialize the environment of the 
> remote database session
> --
>
> Key: SPARK-21519
> URL: https://issues.apache.org/jira/browse/SPARK-21519
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Luca Canali
>Priority: Minor
>
> This proposes an option to the JDBC datasource, tentatively called " 
> sessionInitStatement" to implement the functionality of session 
> initialization present for example in the Sqoop connector for Oracle (see 
> https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
>  ) . After each database session is opened to the remote DB, and before 
> starting to read data, this option executes a custom SQL statement (or a 
> PL/SQL block in the case of Oracle).
> Example of usage, relevant to Oracle JDBC:
> {code}
> val preambleSQL="""
> begin 
>   execute immediate 'alter session set tracefile_identifier=sparkora'; 
>   execute immediate 'alter session set "_serial_direct_read"=true';
>   execute immediate 'alter session set time_zone=''+02:00''';
> end;
> """
> bin/spark-shell –jars ojdb6.jar
> 

[jira] [Updated] (SPARK-21519) Add an option to the JDBC data source to initialize the environment of the remote database session

2017-07-24 Thread Luca Canali (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-21519:

Description: 
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORK").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)
{code}

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however it believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities for injecting user-provided SQL (and PL/SQL) that 
this opens.


  was:
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:


val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORK").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)


Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however it believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities for injecting user-provided SQL (and PL/SQL) that 
this opens.



> Add an option to the JDBC data source to initialize the environment of the 
> remote database session
> --
>
> Key: SPARK-21519
> URL: https://issues.apache.org/jira/browse/SPARK-21519
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Luca Canali
>Priority: Minor
>
> This proposes an option to the JDBC datasource, tentatively called " 
> sessionInitStatement" to implement the functionality of session 
> initialization present for example in the Sqoop connector for Oracle (see 
> https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
>  ) . After each database session is opened to the remote DB, and before 
> starting to read data, this option executes a custom SQL statement (or a 
> PL/SQL block in the case of Oracle).
> Example of usage, relevant to Oracle JDBC:
> {code}
> val preambleSQL="""
> begin 
>   execute immediate 'alter session set tracefile_identifier=sparkora'; 
>   execute immediate 'alter session set "_serial_direct_read"=true';
>   execute immediate 'alter session set time_zone=''+02:00''';
> end;
> """
> bin/spark-shell –jars ojdb6.jar
> val df = 

[jira] [Updated] (SPARK-21519) Add an option to the JDBC data source to initialize the environment of the remote database session

2017-07-24 Thread Luca Canali (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-21519:

Description: 
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{quote}val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORK").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)
{quote}

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however it believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities for injecting user-provided SQL (and PL/SQL) that 
this opens.


  was:
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:

val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORK").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however it believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities for injecting user-provided SQL (and PL/SQL) that 
this opens.



> Add an option to the JDBC data source to initialize the environment of the 
> remote database session
> --
>
> Key: SPARK-21519
> URL: https://issues.apache.org/jira/browse/SPARK-21519
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Luca Canali
>Priority: Minor
>
> This proposes an option to the JDBC datasource, tentatively called " 
> sessionInitStatement" to implement the functionality of session 
> initialization present for example in the Sqoop connector for Oracle (see 
> https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
>  ) . After each database session is opened to the remote DB, and before 
> starting to read data, this option executes a custom SQL statement (or a 
> PL/SQL block in the case of Oracle).
> Example of usage, relevant to Oracle JDBC:
> {quote}val preambleSQL="""
> begin 
>   execute immediate 'alter session set tracefile_identifier=sparkora'; 
>   execute immediate 'alter session set "_serial_direct_read"=true';
>   execute immediate 'alter session set time_zone=''+02:00''';
> end;
> """
> bin/spark-shell –jars ojdb6.jar
> val df = 

[jira] [Updated] (SPARK-21519) Add an option to the JDBC data source to initialize the environment of the remote database session

2017-07-24 Thread Luca Canali (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-21519:

Description: 
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:


val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORK").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)


Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however it believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities for injecting user-provided SQL (and PL/SQL) that 
this opens.


  was:
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{quote}val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORK").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)
{quote}

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however it believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities for injecting user-provided SQL (and PL/SQL) that 
this opens.



> Add an option to the JDBC data source to initialize the environment of the 
> remote database session
> --
>
> Key: SPARK-21519
> URL: https://issues.apache.org/jira/browse/SPARK-21519
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Luca Canali
>Priority: Minor
>
> This proposes an option to the JDBC datasource, tentatively called " 
> sessionInitStatement" to implement the functionality of session 
> initialization present for example in the Sqoop connector for Oracle (see 
> https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
>  ) . After each database session is opened to the remote DB, and before 
> starting to read data, this option executes a custom SQL statement (or a 
> PL/SQL block in the case of Oracle).
> Example of usage, relevant to Oracle JDBC:
> 
> val preambleSQL="""
> begin 
>   execute immediate 'alter session set tracefile_identifier=sparkora'; 
>   execute immediate 'alter session set "_serial_direct_read"=true';
>   execute immediate 'alter session set time_zone=''+02:00''';
> end;
> """
> bin/spark-shell –jars ojdb6.jar
> val df = 

[jira] [Updated] (SPARK-21519) Add an option to the JDBC data source to initialize the environment of the remote database session

2017-07-24 Thread Luca Canali (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-21519:

Description: 
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:

val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORK").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however it believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities for injecting user-provided SQL (and PL/SQL) that 
this opens.


  was:
This proposes an option to the JDBC datasource, tentatively called " 
sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
 ) . After each database session is opened to the remote DB, and before 
starting to read data, this option executes a custom SQL statement (or a PL/SQL 
block in the case of Oracle).
Example of usage, relevant to Oracle JDBC:

```
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;

bin/spark-shell –jars ojdb6.jar

val df = spark.read.format("jdbc").option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name").option("driver", 
"oracle.jdbc.driver.OracleDriver").option("dbtable", "(select 1, sysdate, 
systimestamp, current_timestamp, localtimestamp from dual)").option("user", 
"MYUSER").option("password", 
"MYPASSWORK").option("fetchsize",1000).option("sessionInitStatement", 
preambleSQL).load()

df.show(5,false)
```

Comments: This proposal has been developed and tested for connecting the Spark 
JDBC data source to Oracle databases, however it believe it can be useful for 
other target DBs too, as it is quite generic.  
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities for injecting user-provided SQL (and PL/SQL) that 
this opens.



> Add an option to the JDBC data source to initialize the environment of the 
> remote database session
> --
>
> Key: SPARK-21519
> URL: https://issues.apache.org/jira/browse/SPARK-21519
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Luca Canali
>Priority: Minor
>
> This proposes an option to the JDBC datasource, tentatively called " 
> sessionInitStatement" to implement the functionality of session 
> initialization present for example in the Sqoop connector for Oracle (see 
> https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements
>  ) . After each database session is opened to the remote DB, and before 
> starting to read data, this option executes a custom SQL statement (or a 
> PL/SQL block in the case of Oracle).
> Example of usage, relevant to Oracle JDBC:
> val preambleSQL="""
> begin 
>   execute immediate 'alter session set tracefile_identifier=sparkora'; 
>   execute immediate 'alter session set "_serial_direct_read"=true';
>   execute immediate 'alter session set time_zone=''+02:00''';
> end;
> """
> bin/spark-shell –jars ojdb6.jar
> val df =