Re: [CSV][POLL] How to provide mutable records
On Mon, 12 Feb 2018 18:10:56 -0700, Gary Gregory wrote: > On Fri, Feb 9, 2018 at 10:05 AM, Stian Soiland-Reyes > I've not had time to review this yet but I hope to get to it sometimes this > week. Thanks. I'll wait for that before prepping a 1.6 RC so we get time to decide if this is in or not. -- Stian Soiland-Reyes The University of Manchester http://www.esciencelab.org.uk/ http://orcid.org/-0001-9842-9718 - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [CSV][POLL] How to provide mutable records
On Fri, Feb 9, 2018 at 10:05 AM, Stian Soiland-Reyes wrote: > On Fri, 25 Aug 2017 19:19:58 +0100, Stian Soiland-Reyes > wrote: > > This came up also for commons rdf where we also have everything > immutable, > > which I think is a good principle to keep for modern Java 8 programming. > > > > So you need a mutator function like in (4) that either returns a new > > immutable (but changed) CSVRecord; or alternatively a different > > MutableCSVRecord that can then be built/frozen to a CSVRecord. (These can > > then share a common accessor interface for the passive functions) > > Picking up this thread to consider this for CSV 1.6. > > Not quite as elegant as above, but I made > some mutator functions withValue() in > https://github.com/apache/commons-csv/pull/25 > > > for (CSVRecord r : csvparser) { > CSVRecord rSoup = r.withValue(4, "soup") > .withValue(5, "fish"); > // original r is untouched and can be used again > CSVRecord rBeans = r.withValue(3, "beans"); > > List list; > // Each now different > someList.add(r); > someList.add(rSoup); > someList.add(rBeans); > > // worried someone might touch your beans? > consumeCSVRecord(rBeans.immutable()) > } > > It's not clever enough (yet!) to resize the underlying array if you try to > go > outside the existing columns. The existing parser seems to detect column > number > (and hence record array size) per line so this might be weird for some > inconsistent CSV files. > > > > Comments and changes on CSV-216 branch welcome. > Hi Stian, I've not had time to review this yet but I hope to get to it sometimes this week. Gary > > -- > Stian Soiland-Reyes > The University of Manchester > http://www.esciencelab.org.uk/ > http://orcid.org/-0001-9842-9718 > > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
Re: [CSV][POLL] How to provide mutable records
On Fri, 25 Aug 2017 19:19:58 +0100, Stian Soiland-Reyes wrote: > This came up also for commons rdf where we also have everything immutable, > which I think is a good principle to keep for modern Java 8 programming. > > So you need a mutator function like in (4) that either returns a new > immutable (but changed) CSVRecord; or alternatively a different > MutableCSVRecord that can then be built/frozen to a CSVRecord. (These can > then share a common accessor interface for the passive functions) Picking up this thread to consider this for CSV 1.6. Not quite as elegant as above, but I made some mutator functions withValue() in https://github.com/apache/commons-csv/pull/25 for (CSVRecord r : csvparser) { CSVRecord rSoup = r.withValue(4, "soup") .withValue(5, "fish"); // original r is untouched and can be used again CSVRecord rBeans = r.withValue(3, "beans"); List list; // Each now different someList.add(r); someList.add(rSoup); someList.add(rBeans); // worried someone might touch your beans? consumeCSVRecord(rBeans.immutable()) } It's not clever enough (yet!) to resize the underlying array if you try to go outside the existing columns. The existing parser seems to detect column number (and hence record array size) per line so this might be weird for some inconsistent CSV files. Comments and changes on CSV-216 branch welcome. -- Stian Soiland-Reyes The University of Manchester http://www.esciencelab.org.uk/ http://orcid.org/-0001-9842-9718 - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [CSV][POLL] How to provide mutable records
This came up also for commons rdf where we also have everything immutable, which I think is a good principle to keep for modern Java 8 programming. So you need a mutator function like in (4) that either returns a new immutable (but changed) CSVRecord; or alternatively a different MutableCSVRecord that can then be built/frozen to a CSVRecord. (These can then share a common accessor interface for the passive functions) Is there likely to be many changes to each CSVRecord or just one on each? On 25 Aug 2017 7:05 pm, "Gary Gregory" wrote: On Mon, Aug 21, 2017 at 3:29 PM, sebb wrote: > On 21 August 2017 at 21:04, Gary Gregory wrote: > > Hi All, > > > > We have a request for [CSV] to provide mutable records. There is no clear > > consensus to me on how to do this. The current CSVRecord class is > immutable > > but is not documented as such. I attribute that to YAGNI up to now. > > > > Options range from simply making CSVRecord immutable to creating a new > > CSVMutableRecord class and a few things in between. > > > > I'd like to get a feel what the community thinks here. IMO this boils > down > > to whether or not it matters that CSVRecord remains immutable. > > > > [0] do nothing > > > > [1] Add two put methods to CVSRecord making the class mutable: > > put(int,Object) and put(String,Object). This does not break BC but > changes > > the runtime behavior for apps that expect immutable record and shard the > > records with other components. > > > > [2] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such > > that a new boolean in CVSRecord allow method from 1) above to either work > > or throw an exception. > > > > [3] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such > > that subclass of CVSRecord called CVSMutableRecord is created which > > contains two new put methods. See branch CSV-216. > > > > [4] The factory method: > > /** > > * @param orig Original to be copied. > > * @param replace Fields to be replaced. > > * @return a copy of "orig", except for the fields in "replace". > > */ > > public static CSVRecord createRecord(CSVRecord orig, > > Pair ... replace) > > > > Could also be: > > public static CSVRecord createRecord(CSVRecord orig, > > int[] replaceIndices, > > String[] replaceValues) > > > > I like the simplicity of [1] and I coded [3] to see how cumbersome that > > feels. > > > > So my preference is [1]. > > What about [4]? > > Would that be more complicated/cumbersome to use than [1]? > > Seems to me using a factory or builder to create an updated immutable > copy is the way to go here. > You mean a "mutable" copy right? Because the records are currently immutable. Gary > > > Gary > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
Re: [CSV][POLL] How to provide mutable records
I really do not like [4], I personally would never want to use such an odd looking API with arrays that I have to build as input. What this super simple "solution", #5: Add a new toArray() method to CSVRecord: /** * Clones a new array. * * @return a new array */ public String[] toArray() { return values.clone(); } You can edit the array as you wish and feed it to a CSVPrinter. We can still kibitz with the proposed solution but toArray() gives a lighter weight solution. Gary On Tue, Aug 22, 2017 at 1:27 PM, Oliver Heger wrote: > > > Am 21.08.2017 um 23:29 schrieb sebb: > > On 21 August 2017 at 21:04, Gary Gregory wrote: > >> Hi All, > >> > >> We have a request for [CSV] to provide mutable records. There is no > clear > >> consensus to me on how to do this. The current CSVRecord class is > immutable > >> but is not documented as such. I attribute that to YAGNI up to now. > >> > >> Options range from simply making CSVRecord immutable to creating a new > >> CSVMutableRecord class and a few things in between. > >> > >> I'd like to get a feel what the community thinks here. IMO this boils > down > >> to whether or not it matters that CSVRecord remains immutable. > >> > >> [0] do nothing > >> > >> [1] Add two put methods to CVSRecord making the class mutable: > >> put(int,Object) and put(String,Object). This does not break BC but > changes > >> the runtime behavior for apps that expect immutable record and shard the > >> records with other components. > >> > >> [2] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such > >> that a new boolean in CVSRecord allow method from 1) above to either > work > >> or throw an exception. > >> > >> [3] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such > >> that subclass of CVSRecord called CVSMutableRecord is created which > >> contains two new put methods. See branch CSV-216. > >> > >> [4] The factory method: > >> /** > >> * @param orig Original to be copied. > >> * @param replace Fields to be replaced. > >> * @return a copy of "orig", except for the fields in "replace". > >> */ > >> public static CSVRecord createRecord(CSVRecord orig, > >> Pair ... replace) > >> > >> Could also be: > >> public static CSVRecord createRecord(CSVRecord orig, > >> int[] replaceIndices, > >> String[] replaceValues) > >> > >> I like the simplicity of [1] and I coded [3] to see how cumbersome that > >> feels. > >> > >> So my preference is [1]. > > > > What about [4]? > > > > Would that be more complicated/cumbersome to use than [1]? > > > > Seems to me using a factory or builder to create an updated immutable > > copy is the way to go here. > > Since Java 8 functional concepts and immutable data structures become > more and more popular. It feels a bit strange to me going the opposite > route. So my preference would also go towards [4]. > > The main use case was ETL, correct? We could check how such an approach > would look like in such a scenario and maybe even add more support, e.g. > implement a transformation loop that allows configuring a transformation > function. > > Oliver > > > > >> Gary > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
Re: [CSV][POLL] How to provide mutable records
On Mon, Aug 21, 2017 at 3:29 PM, sebb wrote: > On 21 August 2017 at 21:04, Gary Gregory wrote: > > Hi All, > > > > We have a request for [CSV] to provide mutable records. There is no clear > > consensus to me on how to do this. The current CSVRecord class is > immutable > > but is not documented as such. I attribute that to YAGNI up to now. > > > > Options range from simply making CSVRecord immutable to creating a new > > CSVMutableRecord class and a few things in between. > > > > I'd like to get a feel what the community thinks here. IMO this boils > down > > to whether or not it matters that CSVRecord remains immutable. > > > > [0] do nothing > > > > [1] Add two put methods to CVSRecord making the class mutable: > > put(int,Object) and put(String,Object). This does not break BC but > changes > > the runtime behavior for apps that expect immutable record and shard the > > records with other components. > > > > [2] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such > > that a new boolean in CVSRecord allow method from 1) above to either work > > or throw an exception. > > > > [3] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such > > that subclass of CVSRecord called CVSMutableRecord is created which > > contains two new put methods. See branch CSV-216. > > > > [4] The factory method: > > /** > > * @param orig Original to be copied. > > * @param replace Fields to be replaced. > > * @return a copy of "orig", except for the fields in "replace". > > */ > > public static CSVRecord createRecord(CSVRecord orig, > > Pair ... replace) > > > > Could also be: > > public static CSVRecord createRecord(CSVRecord orig, > > int[] replaceIndices, > > String[] replaceValues) > > > > I like the simplicity of [1] and I coded [3] to see how cumbersome that > > feels. > > > > So my preference is [1]. > > What about [4]? > > Would that be more complicated/cumbersome to use than [1]? > > Seems to me using a factory or builder to create an updated immutable > copy is the way to go here. > You mean a "mutable" copy right? Because the records are currently immutable. Gary > > > Gary > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
Re: [CSV][POLL] How to provide mutable records
Am 22.08.2017 um 21:34 schrieb Gary Gregory: > On Tue, Aug 22, 2017 at 1:27 PM, Oliver Heger > wrote: > >> >> >> Am 21.08.2017 um 23:29 schrieb sebb: >>> On 21 August 2017 at 21:04, Gary Gregory wrote: Hi All, We have a request for [CSV] to provide mutable records. There is no >> clear consensus to me on how to do this. The current CSVRecord class is >> immutable but is not documented as such. I attribute that to YAGNI up to now. Options range from simply making CSVRecord immutable to creating a new CSVMutableRecord class and a few things in between. I'd like to get a feel what the community thinks here. IMO this boils >> down to whether or not it matters that CSVRecord remains immutable. [0] do nothing [1] Add two put methods to CVSRecord making the class mutable: put(int,Object) and put(String,Object). This does not break BC but >> changes the runtime behavior for apps that expect immutable record and shard the records with other components. [2] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such that a new boolean in CVSRecord allow method from 1) above to either >> work or throw an exception. [3] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such that subclass of CVSRecord called CVSMutableRecord is created which contains two new put methods. See branch CSV-216. [4] The factory method: /** * @param orig Original to be copied. * @param replace Fields to be replaced. * @return a copy of "orig", except for the fields in "replace". */ public static CSVRecord createRecord(CSVRecord orig, Pair ... replace) Could also be: public static CSVRecord createRecord(CSVRecord orig, int[] replaceIndices, String[] replaceValues) I like the simplicity of [1] and I coded [3] to see how cumbersome that feels. So my preference is [1]. >>> >>> What about [4]? >>> >>> Would that be more complicated/cumbersome to use than [1]? >>> >>> Seems to me using a factory or builder to create an updated immutable >>> copy is the way to go here. >> >> Since Java 8 functional concepts and immutable data structures become >> more and more popular. It feels a bit strange to me going the opposite >> route. So my preference would also go towards [4]. >> >> The main use case was ETL, correct? We could check how such an approach >> would look like in such a scenario and maybe even add more support, e.g. >> implement a transformation loop that allows configuring a transformation >> function. >> > > > The use case is, IMO, _lightweight_ ETL; for anything serious I would use > Spring Batch. This is why I favor the simplest solution. For this poll my first point is relevant. Regarding lightweight ETL, maybe the new reactive streams in Java 9 may bring some interesting concepts for the future? Oliver > > Gary > > >> >> Oliver >> >>> Gary >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [CSV][POLL] How to provide mutable records
On Tue, Aug 22, 2017 at 1:27 PM, Oliver Heger wrote: > > > Am 21.08.2017 um 23:29 schrieb sebb: > > On 21 August 2017 at 21:04, Gary Gregory wrote: > >> Hi All, > >> > >> We have a request for [CSV] to provide mutable records. There is no > clear > >> consensus to me on how to do this. The current CSVRecord class is > immutable > >> but is not documented as such. I attribute that to YAGNI up to now. > >> > >> Options range from simply making CSVRecord immutable to creating a new > >> CSVMutableRecord class and a few things in between. > >> > >> I'd like to get a feel what the community thinks here. IMO this boils > down > >> to whether or not it matters that CSVRecord remains immutable. > >> > >> [0] do nothing > >> > >> [1] Add two put methods to CVSRecord making the class mutable: > >> put(int,Object) and put(String,Object). This does not break BC but > changes > >> the runtime behavior for apps that expect immutable record and shard the > >> records with other components. > >> > >> [2] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such > >> that a new boolean in CVSRecord allow method from 1) above to either > work > >> or throw an exception. > >> > >> [3] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such > >> that subclass of CVSRecord called CVSMutableRecord is created which > >> contains two new put methods. See branch CSV-216. > >> > >> [4] The factory method: > >> /** > >> * @param orig Original to be copied. > >> * @param replace Fields to be replaced. > >> * @return a copy of "orig", except for the fields in "replace". > >> */ > >> public static CSVRecord createRecord(CSVRecord orig, > >> Pair ... replace) > >> > >> Could also be: > >> public static CSVRecord createRecord(CSVRecord orig, > >> int[] replaceIndices, > >> String[] replaceValues) > >> > >> I like the simplicity of [1] and I coded [3] to see how cumbersome that > >> feels. > >> > >> So my preference is [1]. > > > > What about [4]? > > > > Would that be more complicated/cumbersome to use than [1]? > > > > Seems to me using a factory or builder to create an updated immutable > > copy is the way to go here. > > Since Java 8 functional concepts and immutable data structures become > more and more popular. It feels a bit strange to me going the opposite > route. So my preference would also go towards [4]. > > The main use case was ETL, correct? We could check how such an approach > would look like in such a scenario and maybe even add more support, e.g. > implement a transformation loop that allows configuring a transformation > function. > The use case is, IMO, _lightweight_ ETL; for anything serious I would use Spring Batch. This is why I favor the simplest solution. Gary > > Oliver > > > > >> Gary > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
Re: [CSV][POLL] How to provide mutable records
Am 21.08.2017 um 23:29 schrieb sebb: > On 21 August 2017 at 21:04, Gary Gregory wrote: >> Hi All, >> >> We have a request for [CSV] to provide mutable records. There is no clear >> consensus to me on how to do this. The current CSVRecord class is immutable >> but is not documented as such. I attribute that to YAGNI up to now. >> >> Options range from simply making CSVRecord immutable to creating a new >> CSVMutableRecord class and a few things in between. >> >> I'd like to get a feel what the community thinks here. IMO this boils down >> to whether or not it matters that CSVRecord remains immutable. >> >> [0] do nothing >> >> [1] Add two put methods to CVSRecord making the class mutable: >> put(int,Object) and put(String,Object). This does not break BC but changes >> the runtime behavior for apps that expect immutable record and shard the >> records with other components. >> >> [2] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such >> that a new boolean in CVSRecord allow method from 1) above to either work >> or throw an exception. >> >> [3] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such >> that subclass of CVSRecord called CVSMutableRecord is created which >> contains two new put methods. See branch CSV-216. >> >> [4] The factory method: >> /** >> * @param orig Original to be copied. >> * @param replace Fields to be replaced. >> * @return a copy of "orig", except for the fields in "replace". >> */ >> public static CSVRecord createRecord(CSVRecord orig, >> Pair ... replace) >> >> Could also be: >> public static CSVRecord createRecord(CSVRecord orig, >> int[] replaceIndices, >> String[] replaceValues) >> >> I like the simplicity of [1] and I coded [3] to see how cumbersome that >> feels. >> >> So my preference is [1]. > > What about [4]? > > Would that be more complicated/cumbersome to use than [1]? > > Seems to me using a factory or builder to create an updated immutable > copy is the way to go here. Since Java 8 functional concepts and immutable data structures become more and more popular. It feels a bit strange to me going the opposite route. So my preference would also go towards [4]. The main use case was ETL, correct? We could check how such an approach would look like in such a scenario and maybe even add more support, e.g. implement a transformation loop that allows configuring a transformation function. Oliver > >> Gary > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [CSV][POLL] How to provide mutable records
On 21 August 2017 at 21:04, Gary Gregory wrote: > Hi All, > > We have a request for [CSV] to provide mutable records. There is no clear > consensus to me on how to do this. The current CSVRecord class is immutable > but is not documented as such. I attribute that to YAGNI up to now. > > Options range from simply making CSVRecord immutable to creating a new > CSVMutableRecord class and a few things in between. > > I'd like to get a feel what the community thinks here. IMO this boils down > to whether or not it matters that CSVRecord remains immutable. > > [0] do nothing > > [1] Add two put methods to CVSRecord making the class mutable: > put(int,Object) and put(String,Object). This does not break BC but changes > the runtime behavior for apps that expect immutable record and shard the > records with other components. > > [2] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such > that a new boolean in CVSRecord allow method from 1) above to either work > or throw an exception. > > [3] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such > that subclass of CVSRecord called CVSMutableRecord is created which > contains two new put methods. See branch CSV-216. > > [4] The factory method: > /** > * @param orig Original to be copied. > * @param replace Fields to be replaced. > * @return a copy of "orig", except for the fields in "replace". > */ > public static CSVRecord createRecord(CSVRecord orig, > Pair ... replace) > > Could also be: > public static CSVRecord createRecord(CSVRecord orig, > int[] replaceIndices, > String[] replaceValues) > > I like the simplicity of [1] and I coded [3] to see how cumbersome that > feels. > > So my preference is [1]. What about [4]? Would that be more complicated/cumbersome to use than [1]? Seems to me using a factory or builder to create an updated immutable copy is the way to go here. > Gary - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org