[jira] [Commented] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2018-02-25 Thread Arman Yazdani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375995#comment-16375995
 ] 

Arman Yazdani commented on SPARK-21177:
---

I configured spark with hive. in my case when i want to save partitioned 
dataset to hive, spark waits about 10 minute for hive metastore and metastore 
process uses 100% of 1 thread of cpu. I changed log level of metastore to 
debug, and metastore waits after logging of getMTable function in objectStore 
file. in this 10 minute waiting, spark have not any job to do and just waits 
for hive metastore. this waiting goes up when number of partitions goes up.

> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>Priority: Major
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> ...
> Time taken for time to append to hive: is 3055
> ...
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-07-24 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098344#comment-16098344
 ] 

Liang-Chi Hsieh commented on SPARK-21177:
-

[~hyukjin.kwon] I ran spark-shell and your code snippets, but it still can't 
reproduce the issue. I didn't create the hive table.

> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> ...
> Time taken for time to append to hive: is 3055
> ...
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-07-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098305#comment-16098305
 ] 

Hyukjin Kwon commented on SPARK-21177:
--

[~viirya], I assume you did in the way I did?

> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> ...
> Time taken for time to append to hive: is 3055
> ...
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-07-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098262#comment-16098262
 ] 

Hyukjin Kwon commented on SPARK-21177:
--

Oh, no. I created a Hive table and then inserted into this in Spark. I thought 
you indented to do so. The code path should be different in this case. I think 
I am reproducing this now. Let me reopen this after summarising what I did soon.

> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> ...
> Time taken for time to append to hive: is 3055
> ...
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-07-24 Thread Prashant Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098255#comment-16098255
 ] 

Prashant Sharma commented on SPARK-21177:
-

Yes. In fact, hive should not be setup. Are you on some special hardware like 
Nvme?

> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> ...
> Time taken for time to append to hive: is 3055
> ...
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-07-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098227#comment-16098227
 ] 

Hyukjin Kwon commented on SPARK-21177:
--

Wait ... is the code itself a self-contained reproducer? without any hive step?

> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> Time taken for time to append to hive: is 3055
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-07-24 Thread Prashant Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098149#comment-16098149
 ] 

Prashant Sharma commented on SPARK-21177:
-

Well, I have tried it on three different environment. One on Ubuntu laptop, 
Macbook pro, and a centos.
 Git sha a848d552ef6b5d0d3bb3b2da903478437a8b10aa.

What else can I help you with?

> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> Time taken for time to append to hive: is 3055
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-07-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098148#comment-16098148
 ] 

Hyukjin Kwon commented on SPARK-21177:
--

Meanwhile let me give another try.

> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> Time taken for time to append to hive: is 3055
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-07-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098143#comment-16098143
 ] 

Hyukjin Kwon commented on SPARK-21177:
--

Yea, I did but I could not. Could you describe your environment?

> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> Time taken for time to append to hive: is 3055
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-07-24 Thread Prashant Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098139#comment-16098139
 ] 

Prashant Sharma commented on SPARK-21177:
-

It is pretty easy to reproduce, you need to run it long enough. Increasing the 
number of iterations in the above reproducer, might help.

E.g.

{code:java}

for(i <- 1 to 10) {printTimeTaken("time to append to hive:", () => { Seq(1, 
2).toDF().write.mode("append").saveAsTable("t1"); })}

{code}


> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 1) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> Time taken for time to append to hive: is 3055
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-07-19 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093099#comment-16093099
 ] 

Hyukjin Kwon commented on SPARK-21177:
--

Could you provide some steps to reproduce? I want to follow and reproduce this.

> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 1) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> Time taken for time to append to hive: is 3055
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-07-19 Thread Prashant Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093080#comment-16093080
 ] 

Prashant Sharma commented on SPARK-21177:
-

I can reproduce it on another system with latest master code, this slow down 
happens over time. e.g. It takes about 1 hour for slow down to become 1000ms+.

> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 1) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> Time taken for time to append to hive: is 3055
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21177) df.saveAsTable slows down linearly, with number of appends

2017-07-19 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092643#comment-16092643
 ] 

Liang-Chi Hsieh commented on SPARK-21177:
-

I can't reproduce the reported issue with the codes. Can you verify it is an 
issue in your environment or Spark?

> df.saveAsTable slows down linearly, with number of appends
> --
>
> Key: SPARK-21177
> URL: https://issues.apache.org/jira/browse/SPARK-21177
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Prashant Sharma
>
> In short, please use the following shell transcript for the reproducer. 
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>   /_/
>  
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> def printTimeTaken(str: String, f: () => Unit) {
> val start = System.nanoTime()
> f()
> val end = System.nanoTime()
> val timetaken = end - start
> import scala.concurrent.duration._
> println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
>   }
>  |  |  |  |  |  |  | printTimeTaken: (str: 
> String, f: () => Unit)Unit
> scala> 
> for(i <- 1 to 1) {printTimeTaken("time to append to hive:", () => { 
> Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
> Time taken for time to append to hive: is 284
> Time taken for time to append to hive: is 211
> ...
> ...
> Time taken for time to append to hive: is 2615
> Time taken for time to append to hive: is 3055
> Time taken for time to append to hive: is 22425
> 
> {code}
> Why does it matter ?
> In a streaming job it is not possible to append to hive using this dataframe 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org