RE: flattening record schemas

2023-09-25 Thread Mark Woodcock
For completeness sake, here is the needed JOLT magic to flatten the data in
the way I was aiming for:

[
  {  //this operation just tags a name to the record
"operation": "shift",
"spec": {
  "*": "record.&"
}
  },
  {  //this operation does the actual flattening (but needs the outer tag
to anchor the work; hence the previous operation)
"operation": "shift",
"spec": {
  "record": {
"*":
  "$": "TValue[#2].Name",
  "@": "TValue[#2].Value"
}
  }
}
  },
  { //this operation just adds a pre-formatted key/value onto each item
"operation": "default",
"spec": {
  "TValue[]": {
"*": {
  "class": "unclass"
}
  }
}
  }
]

**CSV Data:
Store,Date,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment
1,05-02-2010,1643690.9,0,42.31,2.572,211.0963582,8.106
1,12-02-2010,1641957.44,1,38.51,2.548,211.2421698,8.106
1,19-02-2010,1611968.17,0,39.93,2.514,211.2891429,8.106
1,26-02-2010,1409727.59,0,46.63,2.561,211.3196429,8.106

**Converted into Bare JSON:
{"Store":"1","Date":"05-02-2010","Weekly_Sales":"1643690.9","Holiday_Flag":"0","Temperature":"42.31","Fuel_Price":"2.572","CPI":"211.0963582","Unemployment":"8.106"}
{"Store":"1","Date":"12-02-2010","Weekly_Sales":"1641957.44","Holiday_Flag":"1","Temperature":"38.51","Fuel_Price":"2.548","CPI":"211.2421698","Unemployment":"8.106"}
{"Store":"1","Date":"19-02-2010","Weekly_Sales":"1611968.17","Holiday_Flag":"0","Temperature":"39.93","Fuel_Price":"2.514","CPI":"211.2891429","Unemployment":"8.106"}
{"Store":"1","Date":"26-02-2010","Weekly_Sales":"1409727.59","Holiday_Flag":"0","Temperature":"46.63","Fuel_Price":"2.561","CPI":"211.3196429","Unemployment":"8.106"}

**and the result of the JOLT processor with the above operations applied:
{"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"05-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1643690.9","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"42.31","class":"unclass"},{"name":"Fuel_Price","value":"2.572","class":"unclass"},{"name":"CPI","value":"211.0963582","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]}
{"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"12-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1641957.44","class":"unclass"},{"name":"Holiday_Flag","value":"1","class":"unclass"},{"name":"Temperature","value":"38.51","class":"unclass"},{"name":"Fuel_Price","value":"2.548","class":"unclass"},{"name":"CPI","value":"211.2421698","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]}
{"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"19-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1611968.17","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"39.93","class":"unclass"},{"name":"Fuel_Price","value":"2.514","class":"unclass"},{"name":"CPI","value":"211.2891429","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]}
{"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"26-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1409727.59","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"46.63","class":"unclass"},{"name":"Fuel_Price","value":"2.561","class":"unclass"},{"name":"CPI","value":"211.3196429","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]}

To be fair, I'm really going to dump the data out as AVRO...but a) I didn't
see much of a question about how to do that (just config the JOLT processor
accordingly) and b) that's not nearly as readable.

hth,

mew


RE: flattening record schemas

2023-09-25 Thread Mark Woodcock
For completeness sake, here is the needed JOLT magic to flatten the data in
the way I was aiming for:

[
  {  //this operation just tags a name to the record
"operation": "shift",
"spec": {
  "*": "record.&"
}
  },
  {  //this operation does the actual flattening (but needs the outer tag
to anchor the work; hence the previous operation)
"operation": "shift",
"spec": {
  "record": {
"*":
  "$": "TValue[#2].Name",
  "@": "TValue[#2].Value"
}
  }
}
  },
  { //this operation just adds a pre-formatted key/value onto each item
"operation": "default",
"spec": {
  "TValue[]": {
"*": {
  "class": "unclass"
}
  }
}
  }
]

CSV Data:
>
Store,Date,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment
> 1,05-02-2010,1643690.9,0,42.31,2.572,211.0963582,8.106
> 1,12-02-2010,1641957.44,1,38.51,2.548,211.2421698,8.106
> 1,19-02-2010,1611968.17,0,39.93,2.514,211.2891429,8.106
> 1,26-02-2010,1409727.59,0,46.63,2.561,211.3196429,8.106

Converted into Bare JSON:
{"Store":"1","Date":"05-02-2010","Weekly_Sales":"1643690.9","Holiday_Flag":"0","Temperature":"42.31","Fuel_Price":"2.572","CPI":"211.0963582","Unemployment":"8.106"}
{"Store":"1","Date":"12-02-2010","Weekly_Sales":"1641957.44","Holiday_Flag":"1","Temperature":"38.51","Fuel_Price":"2.548","CPI":"211.2421698","Unemployment":"8.106"}
{"Store":"1","Date":"19-02-2010","Weekly_Sales":"1611968.17","Holiday_Flag":"0","Temperature":"39.93","Fuel_Price":"2.514","CPI":"211.2891429","Unemployment":"8.106"}
{"Store":"1","Date":"26-02-2010","Weekly_Sales":"1409727.59","Holiday_Flag":"0","Temperature":"46.63","Fuel_Price":"2.561","CPI":"211.3196429","Unemployment":"8.106"}

and the result of the JOLT processor with the above operations applied:
{"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"05-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1643690.9","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"42.31","class":"unclass"},{"name":"Fuel_Price","value":"2.572","class":"unclass"},{"name":"CPI","value":"211.096358
2","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]}
{"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"12-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1641957.44","class":"unclass"},{"name":"Holiday_Flag","value":"1","class":"unclass"},{"name":"Temperature","value":"38.51","class":"unclass"},{"name":"Fuel_Price","value":"2.548","class":"unclass"},{"name":"CPI","value":"211.24216
98","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]}
{"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"19-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1611968.17","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"39.93","class":"unclass"},{"name":"Fuel_Price","value":"2.514","class":"unclass"},{"name":"CPI","value":"211.28914
29","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]}
{"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"26-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1409727.59","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"46.63","class":"unclass"},{"name":"Fuel_Price","value":"2.561","class":"unclass"},{"name":"CPI","value":"211.31964
29","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]}

To be fair, I'm really going to dump the data out as AVRO...but a) I didn't
see much of a question about how to do that (just config the JOLT processor
accordingly) and b) that's not nearly as readable.

hth,

mew


RE: flattening record schemas

2023-09-19 Thread Mark Woodcock
A,

If this were the input data:

Store,Date,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment
1,05-02-2010,1643690.9,0,42.31,2.572,211.0963582,8.106
1,12-02-2010,1641957.44,1,38.51,2.548,211.2421698,8.106
1,19-02-2010,1611968.17,0,39.93,2.514,211.2891429,8.106
1,26-02-2010,1409727.59,0,46.63,2.561,211.3196429,8.106


I would want the first record (csv line) to be converted to something that
looked like:

{
  "entity" : [ {
"Name" : "store",
"Value" : 1
  }, {
"Name" : "date",
"Value" : "05-02-2010"
  }, {
"Name" : "weekly_sales",
"Value" : 1643690.9
  }  .
 ]
}

and each subsequent record to be handled in a similar way.

Looking at the jolt-demo (mentioned above), this seems to be similar to the
map-to-list behavior.

mew


On 2023/09/18 20:19:40 Pierre Villard wrote:
> Hi,
>
> Do you have an example of input and the output you'd expect? There are a
> few options in terms of processors you could use and you may be able to
get
> things working the way you want without using JOLT.
>
> Pierre
>
> Le lun. 18 sept. 2023 à 22:05, Mark Woodcock  a
> écrit :
>
> > Howdy,
> >
> > What I'm aiming for:
> > Something that takes a fairly ordinary record (think a CSV file--so, the
> > name of each column is a distinct entry in the schema), which outputs
the
> > same data, but where the record is now an array of similarly structured
> > items (I presume records, where the fields would be stuff like a
columnName
> > [where the value is CSV column name], value [where the value is the
value],
> > and perhaps other fields).
> >
> > When I asked around, the suggestion was that my need was to flatten the
> > schema and that the JOLT processor might be the way to do that...but my
> > contact had never used JOLT.  The follow-on suggestion was to ask here
> > (well, on "users" but that rejected me, since this address worked
before,
> > I'm trying it) for advice on how one might go about that.
> >
> > thx,
> >
> > mew
> >
>


Re: flattening record schemas

2023-09-18 Thread Pierre Villard
Hi,

Do you have an example of input and the output you'd expect? There are a
few options in terms of processors you could use and you may be able to get
things working the way you want without using JOLT.

Pierre

Le lun. 18 sept. 2023 à 22:05, Mark Woodcock  a
écrit :

> Howdy,
>
> What I'm aiming for:
> Something that takes a fairly ordinary record (think a CSV file--so, the
> name of each column is a distinct entry in the schema), which outputs the
> same data, but where the record is now an array of similarly structured
> items (I presume records, where the fields would be stuff like a columnName
> [where the value is CSV column name], value [where the value is the value],
> and perhaps other fields).
>
> When I asked around, the suggestion was that my need was to flatten the
> schema and that the JOLT processor might be the way to do that...but my
> contact had never used JOLT.  The follow-on suggestion was to ask here
> (well, on "users" but that rejected me, since this address worked before,
> I'm trying it) for advice on how one might go about that.
>
> thx,
>
> mew
>


Re: flattening record schemas

2023-09-18 Thread Alessandro D'Armiento
Hello Mew,
If you are asking what is the quickest way to learn how to use JOLT, I
would suggest you to try this: https://jolt-demo.appspot.com/#inception
There are several examples and a playground.

Ciao,
Alessandro

Il giorno lun 18 set 2023 alle ore 22:05 Mark Woodcock
 ha scritto:

> Howdy,
>
> What I'm aiming for:
> Something that takes a fairly ordinary record (think a CSV file--so, the
> name of each column is a distinct entry in the schema), which outputs the
> same data, but where the record is now an array of similarly structured
> items (I presume records, where the fields would be stuff like a columnName
> [where the value is CSV column name], value [where the value is the value],
> and perhaps other fields).
>
> When I asked around, the suggestion was that my need was to flatten the
> schema and that the JOLT processor might be the way to do that...but my
> contact had never used JOLT.  The follow-on suggestion was to ask here
> (well, on "users" but that rejected me, since this address worked before,
> I'm trying it) for advice on how one might go about that.
>
> thx,
>
> mew
>