[jira] [Updated] (SPARK-23495) Creating a json file using a dataframe Generates an issue

2018-03-06 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-23495:
--
Target Version/s:   (was: 2.1.0)
   Fix Version/s: (was: 2.1.0)

> Creating a json file using a dataframe Generates an issue
> -
>
> Key: SPARK-23495
> URL: https://issues.apache.org/jira/browse/SPARK-23495
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: AIT OUFKIR
>Priority: Major
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Issue happen when trying to create json file using a dataframe (see code 
> below)
> from pyspark.sql import SQLContext
>  a = ["a1","a2"]
>  b = ["b1","b2","b3"]
>  c = ["c1","c2","c3", "c4"]
>  d = \{'d1':1, 'd2':2}
>  e = \{'e1':1, 'e2':2, 'e3':3}
>  f = ['f1','f2','f3']
>  g = ['g1','g2','g3','g4']
> metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d, fasi=f, 
> gasi=g{color:#ff}, easi=e{color})
>  md = sqlContext.createDataFrame([metadata_dump]).collect()
>  metadata = sqlContext.createDataFrame(md,['asi', 'basi', 
> 'casi','dasi','fasi', 'gasi', 'easi'])
> metadata_path = "/folder/fileNameErr"
>  metadata.write.mode('overwrite').json(metadata_path)
> {"{color:#14892c}asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\{"d1":1,"d2":2{color}},"fasi":\{"e1":1,"e2":2,"e3":3},"gasi":["f1","f2","f3"],"easi":["g1","g2","g3","g4{color}"]}
>  
> when switching the dictionary e
>  
> metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d{color:#ff}*, 
> easi=e*{color}, fasi=f, gasi=g)
>  md = sqlContext.createDataFrame([metadata_dump]).collect()
>  metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi', 
> {color:#ff}*'easi',*{color}'fasi', 'gasi'])
>  metadata_path = "/folder/fileNameCorr"
>  metadata.write.mode('overwrite').json(metadata_path)
> {color:#14892c}{"asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\\{"d1":1,"d2":2},"easi":\{"e1":1,"e2":2,"e3":3},"fasi":["f1","f2","f3"],"gasi":["g1","g2","g3","g4"]}{color}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23495) Creating a json file using a dataframe Generates an issue

2018-02-23 Thread AIT OUFKIR (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AIT OUFKIR updated SPARK-23495:
---
Description: 
Issue happen when trying to create json file using a dataframe (see code below)

from pyspark.sql import SQLContext
 a = ["a1","a2"]
 b = ["b1","b2","b3"]
 c = ["c1","c2","c3", "c4"]
 d = \{'d1':1, 'd2':2}
 e = \{'e1':1, 'e2':2, 'e3':3}
 f = ['f1','f2','f3']
 g = ['g1','g2','g3','g4']

metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d, fasi=f, 
gasi=g{color:#ff}, easi=e{color})
 md = sqlContext.createDataFrame([metadata_dump]).collect()
 metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi','fasi', 
'gasi', 'easi'])

metadata_path = "/folder/fileNameErr"
 metadata.write.mode('overwrite').json(metadata_path)

{"{color:#14892c}asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\{"d1":1,"d2":2{color}},"fasi":\{"e1":1,"e2":2,"e3":3},"gasi":["f1","f2","f3"],"easi":["g1","g2","g3","g4{color}"]}

 

when switching the dictionary e

 

metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d{color:#ff}*, 
easi=e*{color}, fasi=f, gasi=g)
 md = sqlContext.createDataFrame([metadata_dump]).collect()
 metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi', 
{color:#ff}*'easi',*{color}'fasi', 'gasi'])
 metadata_path = "/folder/fileNameCorr"
 metadata.write.mode('overwrite').json(metadata_path)

{color:#14892c}{"asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\\{"d1":1,"d2":2},"easi":\{"e1":1,"e2":2,"e3":3},"fasi":["f1","f2","f3"],"gasi":["g1","g2","g3","g4"]}{color}

 

 

 

 

  was:
Issue happen when trying to create json file using a dataframe (see code below)

from pyspark.sql import SQLContext
a = ["a1","a2"]
b = ["b1","b2","b3"]
c = ["c1","c2","c3", "c4"]
d = \{'d1':1, 'd2':2}
e = \{'e1':1, 'e2':2, 'e3':3}
f = ['f1','f2','f3']
g = ['g1','g2','g3','g4']

metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d, fasi=f, 
gasi=g*{color:#FF}, easi=e{color}*)
md = sqlContext.createDataFrame([metadata_dump]).collect()
metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi','fasi', 
'gasi', 'easi'])

metadata_path = "/folder/fileNameErr"
metadata.write.mode('overwrite').json(metadata_path)

{"{color:#14892c}asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":{"d1":1,"d2":2{color}},{color:#FF}"fasi":\{"e1":1,"e2":2,"e3":3},"gasi":["f1","f2","f3"],"easi":["g1","g2","g3","g4{color}"]}

 

when switching the dictionary e

 

metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d{color:#FF}*, 
easi=e*{color}, fasi=f, gasi=g)
md = sqlContext.createDataFrame([metadata_dump]).collect()
metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi', 
{color:#FF}*'easi',*{color}'fasi', 'gasi'])
metadata_path = "/folder/fileNameCorr"
metadata.write.mode('overwrite').json(metadata_path)

{color:#14892c}{"asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\{"d1":1,"d2":2},"easi":\{"e1":1,"e2":2,"e3":3},"fasi":["f1","f2","f3"],"gasi":["g1","g2","g3","g4"]}{color}

 

 

 

 


> Creating a json file using a dataframe Generates an issue
> -
>
> Key: SPARK-23495
> URL: https://issues.apache.org/jira/browse/SPARK-23495
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: AIT OUFKIR
>Priority: Major
> Fix For: 2.1.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Issue happen when trying to create json file using a dataframe (see code 
> below)
> from pyspark.sql import SQLContext
>  a = ["a1","a2"]
>  b = ["b1","b2","b3"]
>  c = ["c1","c2","c3", "c4"]
>  d = \{'d1':1, 'd2':2}
>  e = \{'e1':1, 'e2':2, 'e3':3}
>  f = ['f1','f2','f3']
>  g = ['g1','g2','g3','g4']
> metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d, fasi=f, 
> gasi=g{color:#ff}, easi=e{color})
>  md = sqlContext.createDataFrame([metadata_dump]).collect()
>  metadata = sqlContext.createDataFrame(md,['asi', 'basi', 
> 'casi','dasi','fasi', 'gasi', 'easi'])
> metadata_path = "/folder/fileNameErr"
>  metadata.write.mode('overwrite').json(metadata_path)
> {"{color:#14892c}asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\{"d1":1,"d2":2{color}},"fasi":\{"e1":1,"e2":2,"e3":3},"gasi":["f1","f2","f3"],"easi":["g1","g2","g3","g4{color}"]}
>  
> when switching the dictionary e
>  
> metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d{color:#ff}*, 
> easi=e*{color}, fasi=f, gasi=g)
>  md = sqlContext.createDataFrame([metadata_dump]).collect()
>  metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi', 
> {color:#ff}*'easi',*{color}'fasi', 'gasi'])
>  metadata_path = "/folder/fileNameCorr"
>  

[jira] [Updated] (SPARK-23495) Creating a json file using a dataframe Generates an issue

2018-02-23 Thread AIT OUFKIR (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AIT OUFKIR updated SPARK-23495:
---
 Flags: Important
Remaining Estimate: 4h
 Original Estimate: 4h

This issue can create  Major inconsistencies in data

> Creating a json file using a dataframe Generates an issue
> -
>
> Key: SPARK-23495
> URL: https://issues.apache.org/jira/browse/SPARK-23495
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: AIT OUFKIR
>Priority: Major
> Fix For: 2.1.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Issue happen when trying to create json file using a dataframe (see code 
> below)
> from pyspark.sql import SQLContext
> a = ["a1","a2"]
> b = ["b1","b2","b3"]
> c = ["c1","c2","c3", "c4"]
> d = \{'d1':1, 'd2':2}
> e = \{'e1':1, 'e2':2, 'e3':3}
> f = ['f1','f2','f3']
> g = ['g1','g2','g3','g4']
> metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d, fasi=f, 
> gasi=g*{color:#FF}, easi=e{color}*)
> md = sqlContext.createDataFrame([metadata_dump]).collect()
> metadata = sqlContext.createDataFrame(md,['asi', 'basi', 
> 'casi','dasi','fasi', 'gasi', 'easi'])
> metadata_path = "/folder/fileNameErr"
> metadata.write.mode('overwrite').json(metadata_path)
> {"{color:#14892c}asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":{"d1":1,"d2":2{color}},{color:#FF}"fasi":\{"e1":1,"e2":2,"e3":3},"gasi":["f1","f2","f3"],"easi":["g1","g2","g3","g4{color}"]}
>  
> when switching the dictionary e
>  
> metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d{color:#FF}*, 
> easi=e*{color}, fasi=f, gasi=g)
> md = sqlContext.createDataFrame([metadata_dump]).collect()
> metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi', 
> {color:#FF}*'easi',*{color}'fasi', 'gasi'])
> metadata_path = "/folder/fileNameCorr"
> metadata.write.mode('overwrite').json(metadata_path)
> {color:#14892c}{"asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\{"d1":1,"d2":2},"easi":\{"e1":1,"e2":2,"e3":3},"fasi":["f1","f2","f3"],"gasi":["g1","g2","g3","g4"]}{color}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23495) Creating a json file using a dataframe Generates an issue

2018-02-23 Thread AIT OUFKIR (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AIT OUFKIR updated SPARK-23495:
---
Description: 
Issue happen when trying to create json file using a dataframe (see code below)

from pyspark.sql import SQLContext
a = ["a1","a2"]
b = ["b1","b2","b3"]
c = ["c1","c2","c3", "c4"]
d = \{'d1':1, 'd2':2}
e = \{'e1':1, 'e2':2, 'e3':3}
f = ['f1','f2','f3']
g = ['g1','g2','g3','g4']

metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d, fasi=f, 
gasi=g*{color:#FF}, easi=e{color}*)
md = sqlContext.createDataFrame([metadata_dump]).collect()
metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi','fasi', 
'gasi', 'easi'])

metadata_path = "/folder/fileNameErr"
metadata.write.mode('overwrite').json(metadata_path)

{"{color:#14892c}asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":{"d1":1,"d2":2{color}},{color:#FF}"fasi":\{"e1":1,"e2":2,"e3":3},"gasi":["f1","f2","f3"],"easi":["g1","g2","g3","g4{color}"]}

 

when switching the dictionary e

 

metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d{color:#FF}*, 
easi=e*{color}, fasi=f, gasi=g)
md = sqlContext.createDataFrame([metadata_dump]).collect()
metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi', 
{color:#FF}*'easi',*{color}'fasi', 'gasi'])
metadata_path = "/folder/fileNameCorr"
metadata.write.mode('overwrite').json(metadata_path)

{color:#14892c}{"asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\{"d1":1,"d2":2},"easi":\{"e1":1,"e2":2,"e3":3},"fasi":["f1","f2","f3"],"gasi":["g1","g2","g3","g4"]}{color}

 

 

 

 

  was:
Issue happen when trying to create json file using a dataframe (see code below)

catis = ["CAT1","CAT2"]
constis = ["CONST1","CONST2","CONST3"]
datis = ["DAT1","DATE2","DATE3"]
dictis = \{'A':1, 'B':2}
dummis = ['dum1','dumm2','dumm3']
fifis = \{'fifi1':1, 'fifi2':2, 'fifi3':3}
khikhis = ['khikhi1','khikhi12','khikhi3','khikhi4']

metadata_dump = dict(cati=catis, consti=constis, dati=datis, dicti=dictis, 
khikhi=khikhis, dummi=dummis, fifi=fifis)
md = sqlContext.createDataFrame([metadata_dump]).collect()
metadata = sqlContext.createDataFrame(md,['cati', 'consti', 'dati', 
'dicti','khikhi', 'dummi', 'fifi'])

metadata_path = "/mypath"
metadata.write.mode('overwrite').json(metadata_path)

This gives the following Results :

{"cati":["CAT1","CAT2"]
,"consti":["CONST1","CONST2","CONST3"]
,"dati":["DAT1","DATE2","DATE3"]
,"dicti":\{"A":1,"B":2}
,"khikhi":["dum1","dumm2","dumm3"]
,"dummi":\{"fifi2":2,"fifi3":3,"fifi1":1}
,"fifi":["khikhi1","khikhi12","khikhi3","khikhi4"]}

Which is wrong

 

When I try switching the fifis dict and not putting it at the end of the dict 
metadata_dump then I get the correct results :

 {
"cati":["CAT1","CAT2"]
,"consti":["CONST1","CONST2","CONST3"]
,"dati":["DAT1","DATE2","DATE3"]
,"dicti":\{"A":1,"B":2}
,"dummi":["dum1","dumm2","dumm3"]
,"fifi":\{"fifi2":2,"fifi3":3,"fifi1":1}
,"khikhi":["khikhi1","khikhi12","khikhi3","khikhi4"]
}

 


> Creating a json file using a dataframe Generates an issue
> -
>
> Key: SPARK-23495
> URL: https://issues.apache.org/jira/browse/SPARK-23495
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: AIT OUFKIR
>Priority: Major
> Fix For: 2.1.0
>
>
> Issue happen when trying to create json file using a dataframe (see code 
> below)
> from pyspark.sql import SQLContext
> a = ["a1","a2"]
> b = ["b1","b2","b3"]
> c = ["c1","c2","c3", "c4"]
> d = \{'d1':1, 'd2':2}
> e = \{'e1':1, 'e2':2, 'e3':3}
> f = ['f1','f2','f3']
> g = ['g1','g2','g3','g4']
> metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d, fasi=f, 
> gasi=g*{color:#FF}, easi=e{color}*)
> md = sqlContext.createDataFrame([metadata_dump]).collect()
> metadata = sqlContext.createDataFrame(md,['asi', 'basi', 
> 'casi','dasi','fasi', 'gasi', 'easi'])
> metadata_path = "/folder/fileNameErr"
> metadata.write.mode('overwrite').json(metadata_path)
> {"{color:#14892c}asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":{"d1":1,"d2":2{color}},{color:#FF}"fasi":\{"e1":1,"e2":2,"e3":3},"gasi":["f1","f2","f3"],"easi":["g1","g2","g3","g4{color}"]}
>  
> when switching the dictionary e
>  
> metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d{color:#FF}*, 
> easi=e*{color}, fasi=f, gasi=g)
> md = sqlContext.createDataFrame([metadata_dump]).collect()
> metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi', 
> {color:#FF}*'easi',*{color}'fasi', 'gasi'])
> metadata_path = "/folder/fileNameCorr"
> metadata.write.mode('overwrite').json(metadata_path)
> 

[jira] [Updated] (SPARK-23495) Creating a json file using a dataframe Generates an issue

2018-02-23 Thread AIT OUFKIR (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AIT OUFKIR updated SPARK-23495:
---
Summary: Creating a json file using a dataframe Generates an issue  (was: 
Creating a json file using a dataframe creates an issue)

> Creating a json file using a dataframe Generates an issue
> -
>
> Key: SPARK-23495
> URL: https://issues.apache.org/jira/browse/SPARK-23495
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: AIT OUFKIR
>Priority: Major
> Fix For: 2.1.0
>
>
> Issue happen when trying to create json file using a dataframe (see code 
> below)
> catis = ["CAT1","CAT2"]
> constis = ["CONST1","CONST2","CONST3"]
> datis = ["DAT1","DATE2","DATE3"]
> dictis = \{'A':1, 'B':2}
> dummis = ['dum1','dumm2','dumm3']
> fifis = \{'fifi1':1, 'fifi2':2, 'fifi3':3}
> khikhis = ['khikhi1','khikhi12','khikhi3','khikhi4']
> metadata_dump = dict(cati=catis, consti=constis, dati=datis, dicti=dictis, 
> khikhi=khikhis, dummi=dummis, fifi=fifis)
> md = sqlContext.createDataFrame([metadata_dump]).collect()
> metadata = sqlContext.createDataFrame(md,['cati', 'consti', 'dati', 
> 'dicti','khikhi', 'dummi', 'fifi'])
> metadata_path = "/mypath"
> metadata.write.mode('overwrite').json(metadata_path)
> This gives the following Results :
> {"cati":["CAT1","CAT2"]
> ,"consti":["CONST1","CONST2","CONST3"]
> ,"dati":["DAT1","DATE2","DATE3"]
> ,"dicti":\{"A":1,"B":2}
> ,"khikhi":["dum1","dumm2","dumm3"]
> ,"dummi":\{"fifi2":2,"fifi3":3,"fifi1":1}
> ,"fifi":["khikhi1","khikhi12","khikhi3","khikhi4"]}
> Which is wrong
>  
> When I try switching the fifis dict and not putting it at the end of the dict 
> metadata_dump then I get the correct results :
>  {
> "cati":["CAT1","CAT2"]
> ,"consti":["CONST1","CONST2","CONST3"]
> ,"dati":["DAT1","DATE2","DATE3"]
> ,"dicti":\{"A":1,"B":2}
> ,"dummi":["dum1","dumm2","dumm3"]
> ,"fifi":\{"fifi2":2,"fifi3":3,"fifi1":1}
> ,"khikhi":["khikhi1","khikhi12","khikhi3","khikhi4"]
> }
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org