[jira] [Updated] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-07-24 Thread Prashant Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-21177:

Description: 
In short, please use the following shell transcript for the reproducer. 

{code:java}

Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
  /_/
 
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.

scala> def printTimeTaken(str: String, f: () => Unit) {
val start = System.nanoTime()
f()
val end = System.nanoTime()
val timetaken = end - start
import scala.concurrent.duration._
println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
  }
 |  |  |  |  |  |  | printTimeTaken: (str: String, 
f: () => Unit)Unit

scala> 
for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { Seq(1, 
2).toDF().write.mode("append").saveAsTable("t1"); })}
Time taken for time to append to hive: is 284

Time taken for time to append to hive: is 211

...
...

Time taken for time to append to hive: is 2615

...
Time taken for time to append to hive: is 3055
...
Time taken for time to append to hive: is 22425


{code}

Why does it matter ?

In a streaming job it is not possible to append to hive using this dataframe 
operation.

  was:
In short, please use the following shell transcript for the reproducer. 

{code:java}

Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
  /_/
 
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.

scala> def printTimeTaken(str: String, f: () => Unit) {
val start = System.nanoTime()
f()
val end = System.nanoTime()
val timetaken = end - start
import scala.concurrent.duration._
println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
  }
 |  |  |  |  |  |  | printTimeTaken: (str: String, 
f: () => Unit)Unit

scala> 
for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { Seq(1, 
2).toDF().write.mode("append").saveAsTable("t1"); })}
Time taken for time to append to hive: is 284

Time taken for time to append to hive: is 211

...
...

Time taken for time to append to hive: is 2615

Time taken for time to append to hive: is 3055

Time taken for time to append to hive: is 22425


{code}

Why does it matter ?

In a streaming job it is not possible to append to hive using this dataframe 
operation.


> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> ...
> Time taken for time to append to hive: is 3055
> ...
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-07-24 Thread Prashant Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-21177:

Description: 
In short, please use the following shell transcript for the reproducer. 

{code:java}

Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
  /_/
 
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.

scala> def printTimeTaken(str: String, f: () => Unit) {
val start = System.nanoTime()
f()
val end = System.nanoTime()
val timetaken = end - start
import scala.concurrent.duration._
println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
  }
 |  |  |  |  |  |  | printTimeTaken: (str: String, 
f: () => Unit)Unit

scala> 
for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { Seq(1, 
2).toDF().write.mode("append").saveAsTable("t1"); })}
Time taken for time to append to hive: is 284

Time taken for time to append to hive: is 211

...
...

Time taken for time to append to hive: is 2615

Time taken for time to append to hive: is 3055

Time taken for time to append to hive: is 22425


{code}

Why does it matter ?

In a streaming job it is not possible to append to hive using this dataframe 
operation.

  was:

In short, please use the following shell transcript for the reproducer. 

{code:java}

Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
  /_/
 
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.

scala> def printTimeTaken(str: String, f: () => Unit) {
val start = System.nanoTime()
f()
val end = System.nanoTime()
val timetaken = end - start
import scala.concurrent.duration._
println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
  }
 |  |  |  |  |  |  | printTimeTaken: (str: String, 
f: () => Unit)Unit

scala> 
for(i <- 1 to 1) {printTimeTaken("time to append to hive:", () => { Seq(1, 
2).toDF().write.mode("append").saveAsTable("t1"); })}
Time taken for time to append to hive: is 284

Time taken for time to append to hive: is 211

...
...

Time taken for time to append to hive: is 2615

Time taken for time to append to hive: is 3055

Time taken for time to append to hive: is 22425


{code}

Why does it matter ?

In a streaming job it is not possible to append to hive using this dataframe 
operation.


> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> Time taken for time to append to hive: is 3055
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-06-22 Thread Prashant Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-21177:

Summary: df.saveAsTable slows down linearly, with number of appends  (was: 
df.SaveAsTable slows down linearly, with number of appends.)

> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 1) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> Time taken for time to append to hive: is 3055
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21177) df.SaveAsTable slows down linearly, with number of appends.

2017-06-22 Thread Prashant Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-21177:

Summary: df.SaveAsTable slows down linearly, with number of appends.  (was: 
Append to hive slows down linearly, with number of appends.)

> df.SaveAsTable slows down linearly, with number of appends.
> ---
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 1) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> Time taken for time to append to hive: is 3055
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org