[jira] [Updated] (SPARK-23495) Creating a json file using a dataframe Generates an issue
[ https://issues.apache.org/jira/browse/SPARK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-23495: -- Target Version/s: (was: 2.1.0) Fix Version/s: (was: 2.1.0) > Creating a json file using a dataframe Generates an issue > - > > Key: SPARK-23495 > URL: https://issues.apache.org/jira/browse/SPARK-23495 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.0 >Reporter: AIT OUFKIR >Priority: Major > Original Estimate: 4h > Remaining Estimate: 4h > > Issue happen when trying to create json file using a dataframe (see code > below) > from pyspark.sql import SQLContext > a = ["a1","a2"] > b = ["b1","b2","b3"] > c = ["c1","c2","c3", "c4"] > d = \{'d1':1, 'd2':2} > e = \{'e1':1, 'e2':2, 'e3':3} > f = ['f1','f2','f3'] > g = ['g1','g2','g3','g4'] > metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d, fasi=f, > gasi=g{color:#ff}, easi=e{color}) > md = sqlContext.createDataFrame([metadata_dump]).collect() > metadata = sqlContext.createDataFrame(md,['asi', 'basi', > 'casi','dasi','fasi', 'gasi', 'easi']) > metadata_path = "/folder/fileNameErr" > metadata.write.mode('overwrite').json(metadata_path) > {"{color:#14892c}asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\{"d1":1,"d2":2{color}},"fasi":\{"e1":1,"e2":2,"e3":3},"gasi":["f1","f2","f3"],"easi":["g1","g2","g3","g4{color}"]} > > when switching the dictionary e > > metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d{color:#ff}*, > easi=e*{color}, fasi=f, gasi=g) > md = sqlContext.createDataFrame([metadata_dump]).collect() > metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi', > {color:#ff}*'easi',*{color}'fasi', 'gasi']) > metadata_path = "/folder/fileNameCorr" > metadata.write.mode('overwrite').json(metadata_path) > {color:#14892c}{"asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\\{"d1":1,"d2":2},"easi":\{"e1":1,"e2":2,"e3":3},"fasi":["f1","f2","f3"],"gasi":["g1","g2","g3","g4"]}{color} > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23495) Creating a json file using a dataframe Generates an issue
[ https://issues.apache.org/jira/browse/SPARK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AIT OUFKIR updated SPARK-23495: --- Description: Issue happen when trying to create json file using a dataframe (see code below) from pyspark.sql import SQLContext a = ["a1","a2"] b = ["b1","b2","b3"] c = ["c1","c2","c3", "c4"] d = \{'d1':1, 'd2':2} e = \{'e1':1, 'e2':2, 'e3':3} f = ['f1','f2','f3'] g = ['g1','g2','g3','g4'] metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d, fasi=f, gasi=g{color:#ff}, easi=e{color}) md = sqlContext.createDataFrame([metadata_dump]).collect() metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi','fasi', 'gasi', 'easi']) metadata_path = "/folder/fileNameErr" metadata.write.mode('overwrite').json(metadata_path) {"{color:#14892c}asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\{"d1":1,"d2":2{color}},"fasi":\{"e1":1,"e2":2,"e3":3},"gasi":["f1","f2","f3"],"easi":["g1","g2","g3","g4{color}"]} when switching the dictionary e metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d{color:#ff}*, easi=e*{color}, fasi=f, gasi=g) md = sqlContext.createDataFrame([metadata_dump]).collect() metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi', {color:#ff}*'easi',*{color}'fasi', 'gasi']) metadata_path = "/folder/fileNameCorr" metadata.write.mode('overwrite').json(metadata_path) {color:#14892c}{"asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\\{"d1":1,"d2":2},"easi":\{"e1":1,"e2":2,"e3":3},"fasi":["f1","f2","f3"],"gasi":["g1","g2","g3","g4"]}{color} was: Issue happen when trying to create json file using a dataframe (see code below) from pyspark.sql import SQLContext a = ["a1","a2"] b = ["b1","b2","b3"] c = ["c1","c2","c3", "c4"] d = \{'d1':1, 'd2':2} e = \{'e1':1, 'e2':2, 'e3':3} f = ['f1','f2','f3'] g = ['g1','g2','g3','g4'] metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d, fasi=f, gasi=g*{color:#FF}, easi=e{color}*) md = sqlContext.createDataFrame([metadata_dump]).collect() metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi','fasi', 'gasi', 'easi']) metadata_path = "/folder/fileNameErr" metadata.write.mode('overwrite').json(metadata_path) {"{color:#14892c}asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":{"d1":1,"d2":2{color}},{color:#FF}"fasi":\{"e1":1,"e2":2,"e3":3},"gasi":["f1","f2","f3"],"easi":["g1","g2","g3","g4{color}"]} when switching the dictionary e metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d{color:#FF}*, easi=e*{color}, fasi=f, gasi=g) md = sqlContext.createDataFrame([metadata_dump]).collect() metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi', {color:#FF}*'easi',*{color}'fasi', 'gasi']) metadata_path = "/folder/fileNameCorr" metadata.write.mode('overwrite').json(metadata_path) {color:#14892c}{"asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\{"d1":1,"d2":2},"easi":\{"e1":1,"e2":2,"e3":3},"fasi":["f1","f2","f3"],"gasi":["g1","g2","g3","g4"]}{color} > Creating a json file using a dataframe Generates an issue > - > > Key: SPARK-23495 > URL: https://issues.apache.org/jira/browse/SPARK-23495 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.0 >Reporter: AIT OUFKIR >Priority: Major > Fix For: 2.1.0 > > Original Estimate: 4h > Remaining Estimate: 4h > > Issue happen when trying to create json file using a dataframe (see code > below) > from pyspark.sql import SQLContext > a = ["a1","a2"] > b = ["b1","b2","b3"] > c = ["c1","c2","c3", "c4"] > d = \{'d1':1, 'd2':2} > e = \{'e1':1, 'e2':2, 'e3':3} > f = ['f1','f2','f3'] > g = ['g1','g2','g3','g4'] > metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d, fasi=f, > gasi=g{color:#ff}, easi=e{color}) > md = sqlContext.createDataFrame([metadata_dump]).collect() > metadata = sqlContext.createDataFrame(md,['asi', 'basi', > 'casi','dasi','fasi', 'gasi', 'easi']) > metadata_path = "/folder/fileNameErr" > metadata.write.mode('overwrite').json(metadata_path) > {"{color:#14892c}asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\{"d1":1,"d2":2{color}},"fasi":\{"e1":1,"e2":2,"e3":3},"gasi":["f1","f2","f3"],"easi":["g1","g2","g3","g4{color}"]} > > when switching the dictionary e > > metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d{color:#ff}*, > easi=e*{color}, fasi=f, gasi=g) > md = sqlContext.createDataFrame([metadata_dump]).collect() > metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi', > {color:#ff}*'easi',*{color}'fasi', 'gasi']) > metadata_path = "/folder/fileNameCorr" >
[jira] [Updated] (SPARK-23495) Creating a json file using a dataframe Generates an issue
[ https://issues.apache.org/jira/browse/SPARK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AIT OUFKIR updated SPARK-23495: --- Flags: Important Remaining Estimate: 4h Original Estimate: 4h This issue can create Major inconsistencies in data > Creating a json file using a dataframe Generates an issue > - > > Key: SPARK-23495 > URL: https://issues.apache.org/jira/browse/SPARK-23495 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.0 >Reporter: AIT OUFKIR >Priority: Major > Fix For: 2.1.0 > > Original Estimate: 4h > Remaining Estimate: 4h > > Issue happen when trying to create json file using a dataframe (see code > below) > from pyspark.sql import SQLContext > a = ["a1","a2"] > b = ["b1","b2","b3"] > c = ["c1","c2","c3", "c4"] > d = \{'d1':1, 'd2':2} > e = \{'e1':1, 'e2':2, 'e3':3} > f = ['f1','f2','f3'] > g = ['g1','g2','g3','g4'] > metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d, fasi=f, > gasi=g*{color:#FF}, easi=e{color}*) > md = sqlContext.createDataFrame([metadata_dump]).collect() > metadata = sqlContext.createDataFrame(md,['asi', 'basi', > 'casi','dasi','fasi', 'gasi', 'easi']) > metadata_path = "/folder/fileNameErr" > metadata.write.mode('overwrite').json(metadata_path) > {"{color:#14892c}asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":{"d1":1,"d2":2{color}},{color:#FF}"fasi":\{"e1":1,"e2":2,"e3":3},"gasi":["f1","f2","f3"],"easi":["g1","g2","g3","g4{color}"]} > > when switching the dictionary e > > metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d{color:#FF}*, > easi=e*{color}, fasi=f, gasi=g) > md = sqlContext.createDataFrame([metadata_dump]).collect() > metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi', > {color:#FF}*'easi',*{color}'fasi', 'gasi']) > metadata_path = "/folder/fileNameCorr" > metadata.write.mode('overwrite').json(metadata_path) > {color:#14892c}{"asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\{"d1":1,"d2":2},"easi":\{"e1":1,"e2":2,"e3":3},"fasi":["f1","f2","f3"],"gasi":["g1","g2","g3","g4"]}{color} > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23495) Creating a json file using a dataframe Generates an issue
[ https://issues.apache.org/jira/browse/SPARK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AIT OUFKIR updated SPARK-23495: --- Description: Issue happen when trying to create json file using a dataframe (see code below) from pyspark.sql import SQLContext a = ["a1","a2"] b = ["b1","b2","b3"] c = ["c1","c2","c3", "c4"] d = \{'d1':1, 'd2':2} e = \{'e1':1, 'e2':2, 'e3':3} f = ['f1','f2','f3'] g = ['g1','g2','g3','g4'] metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d, fasi=f, gasi=g*{color:#FF}, easi=e{color}*) md = sqlContext.createDataFrame([metadata_dump]).collect() metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi','fasi', 'gasi', 'easi']) metadata_path = "/folder/fileNameErr" metadata.write.mode('overwrite').json(metadata_path) {"{color:#14892c}asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":{"d1":1,"d2":2{color}},{color:#FF}"fasi":\{"e1":1,"e2":2,"e3":3},"gasi":["f1","f2","f3"],"easi":["g1","g2","g3","g4{color}"]} when switching the dictionary e metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d{color:#FF}*, easi=e*{color}, fasi=f, gasi=g) md = sqlContext.createDataFrame([metadata_dump]).collect() metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi', {color:#FF}*'easi',*{color}'fasi', 'gasi']) metadata_path = "/folder/fileNameCorr" metadata.write.mode('overwrite').json(metadata_path) {color:#14892c}{"asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":\{"d1":1,"d2":2},"easi":\{"e1":1,"e2":2,"e3":3},"fasi":["f1","f2","f3"],"gasi":["g1","g2","g3","g4"]}{color} was: Issue happen when trying to create json file using a dataframe (see code below) catis = ["CAT1","CAT2"] constis = ["CONST1","CONST2","CONST3"] datis = ["DAT1","DATE2","DATE3"] dictis = \{'A':1, 'B':2} dummis = ['dum1','dumm2','dumm3'] fifis = \{'fifi1':1, 'fifi2':2, 'fifi3':3} khikhis = ['khikhi1','khikhi12','khikhi3','khikhi4'] metadata_dump = dict(cati=catis, consti=constis, dati=datis, dicti=dictis, khikhi=khikhis, dummi=dummis, fifi=fifis) md = sqlContext.createDataFrame([metadata_dump]).collect() metadata = sqlContext.createDataFrame(md,['cati', 'consti', 'dati', 'dicti','khikhi', 'dummi', 'fifi']) metadata_path = "/mypath" metadata.write.mode('overwrite').json(metadata_path) This gives the following Results : {"cati":["CAT1","CAT2"] ,"consti":["CONST1","CONST2","CONST3"] ,"dati":["DAT1","DATE2","DATE3"] ,"dicti":\{"A":1,"B":2} ,"khikhi":["dum1","dumm2","dumm3"] ,"dummi":\{"fifi2":2,"fifi3":3,"fifi1":1} ,"fifi":["khikhi1","khikhi12","khikhi3","khikhi4"]} Which is wrong When I try switching the fifis dict and not putting it at the end of the dict metadata_dump then I get the correct results : { "cati":["CAT1","CAT2"] ,"consti":["CONST1","CONST2","CONST3"] ,"dati":["DAT1","DATE2","DATE3"] ,"dicti":\{"A":1,"B":2} ,"dummi":["dum1","dumm2","dumm3"] ,"fifi":\{"fifi2":2,"fifi3":3,"fifi1":1} ,"khikhi":["khikhi1","khikhi12","khikhi3","khikhi4"] } > Creating a json file using a dataframe Generates an issue > - > > Key: SPARK-23495 > URL: https://issues.apache.org/jira/browse/SPARK-23495 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.0 >Reporter: AIT OUFKIR >Priority: Major > Fix For: 2.1.0 > > > Issue happen when trying to create json file using a dataframe (see code > below) > from pyspark.sql import SQLContext > a = ["a1","a2"] > b = ["b1","b2","b3"] > c = ["c1","c2","c3", "c4"] > d = \{'d1':1, 'd2':2} > e = \{'e1':1, 'e2':2, 'e3':3} > f = ['f1','f2','f3'] > g = ['g1','g2','g3','g4'] > metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d, fasi=f, > gasi=g*{color:#FF}, easi=e{color}*) > md = sqlContext.createDataFrame([metadata_dump]).collect() > metadata = sqlContext.createDataFrame(md,['asi', 'basi', > 'casi','dasi','fasi', 'gasi', 'easi']) > metadata_path = "/folder/fileNameErr" > metadata.write.mode('overwrite').json(metadata_path) > {"{color:#14892c}asi":["a1","a2"],"basi":["b1","b2","b3"],"casi":["c1","c2","c3","c4"],"dasi":{"d1":1,"d2":2{color}},{color:#FF}"fasi":\{"e1":1,"e2":2,"e3":3},"gasi":["f1","f2","f3"],"easi":["g1","g2","g3","g4{color}"]} > > when switching the dictionary e > > metadata_dump = dict(asi=a, basi=b, casi = c, dasi=d{color:#FF}*, > easi=e*{color}, fasi=f, gasi=g) > md = sqlContext.createDataFrame([metadata_dump]).collect() > metadata = sqlContext.createDataFrame(md,['asi', 'basi', 'casi','dasi', > {color:#FF}*'easi',*{color}'fasi', 'gasi']) > metadata_path = "/folder/fileNameCorr" > metadata.write.mode('overwrite').json(metadata_path) >
[jira] [Updated] (SPARK-23495) Creating a json file using a dataframe Generates an issue
[ https://issues.apache.org/jira/browse/SPARK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AIT OUFKIR updated SPARK-23495: --- Summary: Creating a json file using a dataframe Generates an issue (was: Creating a json file using a dataframe creates an issue) > Creating a json file using a dataframe Generates an issue > - > > Key: SPARK-23495 > URL: https://issues.apache.org/jira/browse/SPARK-23495 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.0 >Reporter: AIT OUFKIR >Priority: Major > Fix For: 2.1.0 > > > Issue happen when trying to create json file using a dataframe (see code > below) > catis = ["CAT1","CAT2"] > constis = ["CONST1","CONST2","CONST3"] > datis = ["DAT1","DATE2","DATE3"] > dictis = \{'A':1, 'B':2} > dummis = ['dum1','dumm2','dumm3'] > fifis = \{'fifi1':1, 'fifi2':2, 'fifi3':3} > khikhis = ['khikhi1','khikhi12','khikhi3','khikhi4'] > metadata_dump = dict(cati=catis, consti=constis, dati=datis, dicti=dictis, > khikhi=khikhis, dummi=dummis, fifi=fifis) > md = sqlContext.createDataFrame([metadata_dump]).collect() > metadata = sqlContext.createDataFrame(md,['cati', 'consti', 'dati', > 'dicti','khikhi', 'dummi', 'fifi']) > metadata_path = "/mypath" > metadata.write.mode('overwrite').json(metadata_path) > This gives the following Results : > {"cati":["CAT1","CAT2"] > ,"consti":["CONST1","CONST2","CONST3"] > ,"dati":["DAT1","DATE2","DATE3"] > ,"dicti":\{"A":1,"B":2} > ,"khikhi":["dum1","dumm2","dumm3"] > ,"dummi":\{"fifi2":2,"fifi3":3,"fifi1":1} > ,"fifi":["khikhi1","khikhi12","khikhi3","khikhi4"]} > Which is wrong > > When I try switching the fifis dict and not putting it at the end of the dict > metadata_dump then I get the correct results : > { > "cati":["CAT1","CAT2"] > ,"consti":["CONST1","CONST2","CONST3"] > ,"dati":["DAT1","DATE2","DATE3"] > ,"dicti":\{"A":1,"B":2} > ,"dummi":["dum1","dumm2","dumm3"] > ,"fifi":\{"fifi2":2,"fifi3":3,"fifi1":1} > ,"khikhi":["khikhi1","khikhi12","khikhi3","khikhi4"] > } > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org