RE: flattening record schemas
For completeness sake, here is the needed JOLT magic to flatten the data in the way I was aiming for: [ { //this operation just tags a name to the record "operation": "shift", "spec": { "*": "record.&" } }, { //this operation does the actual flattening (but needs the outer tag to anchor the work; hence the previous operation) "operation": "shift", "spec": { "record": { "*": "$": "TValue[#2].Name", "@": "TValue[#2].Value" } } } }, { //this operation just adds a pre-formatted key/value onto each item "operation": "default", "spec": { "TValue[]": { "*": { "class": "unclass" } } } } ] **CSV Data: Store,Date,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment 1,05-02-2010,1643690.9,0,42.31,2.572,211.0963582,8.106 1,12-02-2010,1641957.44,1,38.51,2.548,211.2421698,8.106 1,19-02-2010,1611968.17,0,39.93,2.514,211.2891429,8.106 1,26-02-2010,1409727.59,0,46.63,2.561,211.3196429,8.106 **Converted into Bare JSON: {"Store":"1","Date":"05-02-2010","Weekly_Sales":"1643690.9","Holiday_Flag":"0","Temperature":"42.31","Fuel_Price":"2.572","CPI":"211.0963582","Unemployment":"8.106"} {"Store":"1","Date":"12-02-2010","Weekly_Sales":"1641957.44","Holiday_Flag":"1","Temperature":"38.51","Fuel_Price":"2.548","CPI":"211.2421698","Unemployment":"8.106"} {"Store":"1","Date":"19-02-2010","Weekly_Sales":"1611968.17","Holiday_Flag":"0","Temperature":"39.93","Fuel_Price":"2.514","CPI":"211.2891429","Unemployment":"8.106"} {"Store":"1","Date":"26-02-2010","Weekly_Sales":"1409727.59","Holiday_Flag":"0","Temperature":"46.63","Fuel_Price":"2.561","CPI":"211.3196429","Unemployment":"8.106"} **and the result of the JOLT processor with the above operations applied: {"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"05-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1643690.9","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"42.31","class":"unclass"},{"name":"Fuel_Price","value":"2.572","class":"unclass"},{"name":"CPI","value":"211.0963582","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]} {"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"12-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1641957.44","class":"unclass"},{"name":"Holiday_Flag","value":"1","class":"unclass"},{"name":"Temperature","value":"38.51","class":"unclass"},{"name":"Fuel_Price","value":"2.548","class":"unclass"},{"name":"CPI","value":"211.2421698","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]} {"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"19-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1611968.17","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"39.93","class":"unclass"},{"name":"Fuel_Price","value":"2.514","class":"unclass"},{"name":"CPI","value":"211.2891429","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]} {"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"26-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1409727.59","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"46.63","class":"unclass"},{"name":"Fuel_Price","value":"2.561","class":"unclass"},{"name":"CPI","value":"211.3196429","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]} To be fair, I'm really going to dump the data out as AVRO...but a) I didn't see much of a question about how to do that (just config the JOLT processor accordingly) and b) that's not nearly as readable. hth, mew
RE: flattening record schemas
For completeness sake, here is the needed JOLT magic to flatten the data in the way I was aiming for: [ { //this operation just tags a name to the record "operation": "shift", "spec": { "*": "record.&" } }, { //this operation does the actual flattening (but needs the outer tag to anchor the work; hence the previous operation) "operation": "shift", "spec": { "record": { "*": "$": "TValue[#2].Name", "@": "TValue[#2].Value" } } } }, { //this operation just adds a pre-formatted key/value onto each item "operation": "default", "spec": { "TValue[]": { "*": { "class": "unclass" } } } } ] CSV Data: > Store,Date,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment > 1,05-02-2010,1643690.9,0,42.31,2.572,211.0963582,8.106 > 1,12-02-2010,1641957.44,1,38.51,2.548,211.2421698,8.106 > 1,19-02-2010,1611968.17,0,39.93,2.514,211.2891429,8.106 > 1,26-02-2010,1409727.59,0,46.63,2.561,211.3196429,8.106 Converted into Bare JSON: {"Store":"1","Date":"05-02-2010","Weekly_Sales":"1643690.9","Holiday_Flag":"0","Temperature":"42.31","Fuel_Price":"2.572","CPI":"211.0963582","Unemployment":"8.106"} {"Store":"1","Date":"12-02-2010","Weekly_Sales":"1641957.44","Holiday_Flag":"1","Temperature":"38.51","Fuel_Price":"2.548","CPI":"211.2421698","Unemployment":"8.106"} {"Store":"1","Date":"19-02-2010","Weekly_Sales":"1611968.17","Holiday_Flag":"0","Temperature":"39.93","Fuel_Price":"2.514","CPI":"211.2891429","Unemployment":"8.106"} {"Store":"1","Date":"26-02-2010","Weekly_Sales":"1409727.59","Holiday_Flag":"0","Temperature":"46.63","Fuel_Price":"2.561","CPI":"211.3196429","Unemployment":"8.106"} and the result of the JOLT processor with the above operations applied: {"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"05-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1643690.9","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"42.31","class":"unclass"},{"name":"Fuel_Price","value":"2.572","class":"unclass"},{"name":"CPI","value":"211.096358 2","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]} {"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"12-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1641957.44","class":"unclass"},{"name":"Holiday_Flag","value":"1","class":"unclass"},{"name":"Temperature","value":"38.51","class":"unclass"},{"name":"Fuel_Price","value":"2.548","class":"unclass"},{"name":"CPI","value":"211.24216 98","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]} {"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"19-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1611968.17","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"39.93","class":"unclass"},{"name":"Fuel_Price","value":"2.514","class":"unclass"},{"name":"CPI","value":"211.28914 29","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]} {"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"26-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1409727.59","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"46.63","class":"unclass"},{"name":"Fuel_Price","value":"2.561","class":"unclass"},{"name":"CPI","value":"211.31964 29","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]} To be fair, I'm really going to dump the data out as AVRO...but a) I didn't see much of a question about how to do that (just config the JOLT processor accordingly) and b) that's not nearly as readable. hth, mew
RE: flattening record schemas
A, If this were the input data: Store,Date,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment 1,05-02-2010,1643690.9,0,42.31,2.572,211.0963582,8.106 1,12-02-2010,1641957.44,1,38.51,2.548,211.2421698,8.106 1,19-02-2010,1611968.17,0,39.93,2.514,211.2891429,8.106 1,26-02-2010,1409727.59,0,46.63,2.561,211.3196429,8.106 I would want the first record (csv line) to be converted to something that looked like: { "entity" : [ { "Name" : "store", "Value" : 1 }, { "Name" : "date", "Value" : "05-02-2010" }, { "Name" : "weekly_sales", "Value" : 1643690.9 } . ] } and each subsequent record to be handled in a similar way. Looking at the jolt-demo (mentioned above), this seems to be similar to the map-to-list behavior. mew On 2023/09/18 20:19:40 Pierre Villard wrote: > Hi, > > Do you have an example of input and the output you'd expect? There are a > few options in terms of processors you could use and you may be able to get > things working the way you want without using JOLT. > > Pierre > > Le lun. 18 sept. 2023 à 22:05, Mark Woodcock a > écrit : > > > Howdy, > > > > What I'm aiming for: > > Something that takes a fairly ordinary record (think a CSV file--so, the > > name of each column is a distinct entry in the schema), which outputs the > > same data, but where the record is now an array of similarly structured > > items (I presume records, where the fields would be stuff like a columnName > > [where the value is CSV column name], value [where the value is the value], > > and perhaps other fields). > > > > When I asked around, the suggestion was that my need was to flatten the > > schema and that the JOLT processor might be the way to do that...but my > > contact had never used JOLT. The follow-on suggestion was to ask here > > (well, on "users" but that rejected me, since this address worked before, > > I'm trying it) for advice on how one might go about that. > > > > thx, > > > > mew > > >
Re: flattening record schemas
Hi, Do you have an example of input and the output you'd expect? There are a few options in terms of processors you could use and you may be able to get things working the way you want without using JOLT. Pierre Le lun. 18 sept. 2023 à 22:05, Mark Woodcock a écrit : > Howdy, > > What I'm aiming for: > Something that takes a fairly ordinary record (think a CSV file--so, the > name of each column is a distinct entry in the schema), which outputs the > same data, but where the record is now an array of similarly structured > items (I presume records, where the fields would be stuff like a columnName > [where the value is CSV column name], value [where the value is the value], > and perhaps other fields). > > When I asked around, the suggestion was that my need was to flatten the > schema and that the JOLT processor might be the way to do that...but my > contact had never used JOLT. The follow-on suggestion was to ask here > (well, on "users" but that rejected me, since this address worked before, > I'm trying it) for advice on how one might go about that. > > thx, > > mew >
Re: flattening record schemas
Hello Mew, If you are asking what is the quickest way to learn how to use JOLT, I would suggest you to try this: https://jolt-demo.appspot.com/#inception There are several examples and a playground. Ciao, Alessandro Il giorno lun 18 set 2023 alle ore 22:05 Mark Woodcock ha scritto: > Howdy, > > What I'm aiming for: > Something that takes a fairly ordinary record (think a CSV file--so, the > name of each column is a distinct entry in the schema), which outputs the > same data, but where the record is now an array of similarly structured > items (I presume records, where the fields would be stuff like a columnName > [where the value is CSV column name], value [where the value is the value], > and perhaps other fields). > > When I asked around, the suggestion was that my need was to flatten the > schema and that the JOLT processor might be the way to do that...but my > contact had never used JOLT. The follow-on suggestion was to ask here > (well, on "users" but that rejected me, since this address worked before, > I'm trying it) for advice on how one might go about that. > > thx, > > mew >