[jira] [Updated] (SPARK-12763) Spark gets stuck executing SSB query

2019-05-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-12763:
-
Labels: bulk-closed  (was: )

> Spark gets stuck executing SSB query
> 
>
> Key: SPARK-12763
> URL: https://issues.apache.org/jira/browse/SPARK-12763
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: Standalone cluster
>Reporter: Vadim Tkachenko
>Priority: Major
>  Labels: bulk-closed
> Attachments: Spark shell - Details for Stage 5 (Attempt 0).pdf
>
>
> I am trying to emulate SSB load. Data generated with 
> https://github.com/Percona-Lab/ssb-dbgen
> generated size is with 1000 scale factor and converted to parquet format.
> Now there is a following script
> val pLineOrder = 
> sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/lineorder").cache()
> val pDate = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/date").cache()
> val pPart = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/part").cache()
> val pSupplier = 
> sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/supplier").cache()
> val pCustomer = 
> sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/customer").cache()
> pLineOrder.registerTempTable("lineorder")
> pDate.registerTempTable("date")
> pPart.registerTempTable("part")
> pSupplier.registerTempTable("supplier")
> pCustomer.registerTempTable("customer")
> query 
> val sql41=sqlContext.sql("select D_YEAR, C_NATION,sum(LO_REVENUE - 
> LO_SUPPLYCOST) as profit from date, customer, supplier, part, lineorder 
> where LO_CUSTKEY = C_CUSTKEYand LO_SUPPKEY = S_SUPPKEYand 
> LO_PARTKEY = P_PARTKEY   and LO_ORDERDATE = D_DATEKEYand C_REGION = 
> 'AMERICA'and S_REGION = 'AMERICA'and (P_MFGR = 'MFGR#1' or P_MFGR = 
> 'MFGR#2') group by D_YEAR, C_NATION order by D_YEAR, C_NATION")
> and 
> sql41.show()
> get stuck, at some point there is no progress and server is fully idle, but 
> Job is staying at the same stage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12763) Spark gets stuck executing SSB query

2016-01-12 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12763:
--
Component/s: SQL

> Spark gets stuck executing SSB query
> 
>
> Key: SPARK-12763
> URL: https://issues.apache.org/jira/browse/SPARK-12763
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: Standalone cluster
>Reporter: Vadim Tkachenko
> Attachments: Spark shell - Details for Stage 5 (Attempt 0).pdf
>
>
> I am trying to emulate SSB load. Data generated with 
> https://github.com/Percona-Lab/ssb-dbgen
> generated size is with 1000 scale factor and converted to parquet format.
> Now there is a following script
> val pLineOrder = 
> sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/lineorder").cache()
> val pDate = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/date").cache()
> val pPart = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/part").cache()
> val pSupplier = 
> sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/supplier").cache()
> val pCustomer = 
> sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/customer").cache()
> pLineOrder.registerTempTable("lineorder")
> pDate.registerTempTable("date")
> pPart.registerTempTable("part")
> pSupplier.registerTempTable("supplier")
> pCustomer.registerTempTable("customer")
> query 
> val sql41=sqlContext.sql("select D_YEAR, C_NATION,sum(LO_REVENUE - 
> LO_SUPPLYCOST) as profit from date, customer, supplier, part, lineorder 
> where LO_CUSTKEY = C_CUSTKEYand LO_SUPPKEY = S_SUPPKEYand 
> LO_PARTKEY = P_PARTKEY   and LO_ORDERDATE = D_DATEKEYand C_REGION = 
> 'AMERICA'and S_REGION = 'AMERICA'and (P_MFGR = 'MFGR#1' or P_MFGR = 
> 'MFGR#2') group by D_YEAR, C_NATION order by D_YEAR, C_NATION")
> and 
> sql41.show()
> get stuck, at some point there is no progress and server is fully idle, but 
> Job is staying at the same stage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12763) Spark gets stuck executing SSB query

2016-01-11 Thread Vadim Tkachenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Tkachenko updated SPARK-12763:

Attachment: Spark shell - Details for Stage 5 (Attempt 0).pdf

Details on the stalled stage

> Spark gets stuck executing SSB query
> 
>
> Key: SPARK-12763
> URL: https://issues.apache.org/jira/browse/SPARK-12763
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
> Environment: Standalone cluster
>Reporter: Vadim Tkachenko
> Attachments: Spark shell - Details for Stage 5 (Attempt 0).pdf
>
>
> I am trying to emulate SSB load. Data generated with 
> https://github.com/Percona-Lab/ssb-dbgen
> generated size is with 1000 scale factor and converted to parquet format.
> Now there is a following script
> val pLineOrder = 
> sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/lineorder").cache()
> val pDate = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/date").cache()
> val pPart = sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/part").cache()
> val pSupplier = 
> sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/supplier").cache()
> val pCustomer = 
> sqlContext.read.parquet("/mnt/i3600/spark/ssb-1000/customer").cache()
> pLineOrder.registerTempTable("lineorder")
> pDate.registerTempTable("date")
> pPart.registerTempTable("part")
> pSupplier.registerTempTable("supplier")
> pCustomer.registerTempTable("customer")
> query 
> val sql41=sqlContext.sql("select D_YEAR, C_NATION,sum(LO_REVENUE - 
> LO_SUPPLYCOST) as profit from date, customer, supplier, part, lineorder 
> where LO_CUSTKEY = C_CUSTKEYand LO_SUPPKEY = S_SUPPKEYand 
> LO_PARTKEY = P_PARTKEY   and LO_ORDERDATE = D_DATEKEYand C_REGION = 
> 'AMERICA'and S_REGION = 'AMERICA'and (P_MFGR = 'MFGR#1' or P_MFGR = 
> 'MFGR#2') group by D_YEAR, C_NATION order by D_YEAR, C_NATION")
> and 
> sql41.show()
> get stuck, at some point there is no progress and server is fully idle, but 
> Job is staying at the same stage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org