Re: [CSV][POLL] How to provide mutable records

2018-02-13 Thread Stian Soiland-Reyes
On Mon, 12 Feb 2018 18:10:56 -0700, Gary Gregory  wrote:
> On Fri, Feb 9, 2018 at 10:05 AM, Stian Soiland-Reyes 
> I've not had time to review this yet but I hope to get to it sometimes this
> week.

Thanks. I'll wait for that before prepping a 1.6 RC so we get time to
decide if this is in or not.


-- 
Stian Soiland-Reyes
The University of Manchester
http://www.esciencelab.org.uk/
http://orcid.org/-0001-9842-9718



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [CSV][POLL] How to provide mutable records

2018-02-12 Thread Gary Gregory
On Fri, Feb 9, 2018 at 10:05 AM, Stian Soiland-Reyes 
wrote:

> On Fri, 25 Aug 2017 19:19:58 +0100, Stian Soiland-Reyes 
> wrote:
> > This came up also for commons rdf where we also have everything
> immutable,
> > which I think is a good principle to keep for modern Java 8 programming.
> >
> > So you need a mutator function like in (4) that either returns a new
> > immutable (but changed) CSVRecord; or alternatively a different
> > MutableCSVRecord that can then be built/frozen to a CSVRecord. (These can
> > then share a common accessor interface for the passive functions)
>
> Picking up this thread to consider this for CSV 1.6.
>
> Not quite as elegant as above, but I made
> some mutator functions withValue() in
> https://github.com/apache/commons-csv/pull/25
>
>
> for (CSVRecord r : csvparser) {
>   CSVRecord rSoup = r.withValue(4, "soup")
>  .withValue(5, "fish");
>   // original r is untouched and can be used again
>   CSVRecord rBeans = r.withValue(3, "beans");
>
>   List list;
>   // Each now different
>   someList.add(r);
>   someList.add(rSoup);
>   someList.add(rBeans);
>
>   // worried someone might touch your beans?
>   consumeCSVRecord(rBeans.immutable())
> }
>
> It's not clever enough (yet!) to resize the underlying array if you try to
> go
> outside the existing columns. The existing parser seems to detect column
> number
> (and hence record array size) per line so this might be weird for some
> inconsistent CSV files.
>
>
>
> Comments and changes on CSV-216 branch welcome.
>

Hi Stian,

I've not had time to review this yet but I hope to get to it sometimes this
week.

Gary

>
> --
> Stian Soiland-Reyes
> The University of Manchester
> http://www.esciencelab.org.uk/
> http://orcid.org/-0001-9842-9718
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [CSV][POLL] How to provide mutable records

2018-02-09 Thread Stian Soiland-Reyes
On Fri, 25 Aug 2017 19:19:58 +0100, Stian Soiland-Reyes  
wrote:
> This came up also for commons rdf where we also have everything immutable,
> which I think is a good principle to keep for modern Java 8 programming.
> 
> So you need a mutator function like in (4) that either returns a new
> immutable (but changed) CSVRecord; or alternatively a different
> MutableCSVRecord that can then be built/frozen to a CSVRecord. (These can
> then share a common accessor interface for the passive functions)

Picking up this thread to consider this for CSV 1.6.

Not quite as elegant as above, but I made
some mutator functions withValue() in 
https://github.com/apache/commons-csv/pull/25


for (CSVRecord r : csvparser) {
  CSVRecord rSoup = r.withValue(4, "soup")
 .withValue(5, "fish");
  // original r is untouched and can be used again
  CSVRecord rBeans = r.withValue(3, "beans");

  List list;
  // Each now different
  someList.add(r);
  someList.add(rSoup);
  someList.add(rBeans);

  // worried someone might touch your beans?
  consumeCSVRecord(rBeans.immutable())
}

It's not clever enough (yet!) to resize the underlying array if you try to go
outside the existing columns. The existing parser seems to detect column number
(and hence record array size) per line so this might be weird for some
inconsistent CSV files.



Comments and changes on CSV-216 branch welcome.

-- 
Stian Soiland-Reyes
The University of Manchester
http://www.esciencelab.org.uk/
http://orcid.org/-0001-9842-9718


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [CSV][POLL] How to provide mutable records

2017-08-25 Thread Stian Soiland-Reyes
This came up also for commons rdf where we also have everything immutable,
which I think is a good principle to keep for modern Java 8 programming.

So you need a mutator function like in (4) that either returns a new
immutable (but changed) CSVRecord; or alternatively a different
MutableCSVRecord that can then be built/frozen to a CSVRecord. (These can
then share a common accessor interface for the passive functions)

Is there likely to be many changes to each CSVRecord or just one on each?

On 25 Aug 2017 7:05 pm, "Gary Gregory"  wrote:

On Mon, Aug 21, 2017 at 3:29 PM, sebb  wrote:

> On 21 August 2017 at 21:04, Gary Gregory  wrote:
> > Hi All,
> >
> > We have a request for [CSV] to provide mutable records. There is no
clear
> > consensus to me on how to do this. The current CSVRecord class is
> immutable
> > but is not documented as such. I attribute that to YAGNI up to now.
> >
> > Options range from simply making CSVRecord immutable to creating a new
> > CSVMutableRecord class and a few things in between.
> >
> > I'd like to get a feel what the community thinks here. IMO this boils
> down
> > to whether or not it matters that CSVRecord remains immutable.
> >
> > [0] do nothing
> >
> > [1] Add two put methods to CVSRecord making the class mutable:
> > put(int,Object) and put(String,Object). This does not break BC but
> changes
> > the runtime behavior for apps that expect immutable record and shard the
> > records with other components.
> >
> > [2] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
> > that a new boolean in CVSRecord allow method from 1) above to either
work
> > or throw an exception.
> >
> > [3] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
> > that subclass of CVSRecord called CVSMutableRecord is created which
> > contains two new put methods. See branch CSV-216.
> >
> > [4] The factory method:
> >  /**
> >   * @param orig Original to be copied.
> >   * @param replace Fields to be replaced.
> >   * @return a copy of "orig", except for the fields in "replace".
> >   */
> >  public static CSVRecord createRecord(CSVRecord orig,
> >   Pair ... replace)
> >
> > Could also be:
> >  public static CSVRecord createRecord(CSVRecord orig,
> >   int[] replaceIndices,
> >   String[] replaceValues)
> >
> > I like the simplicity of [1] and I coded [3] to see how cumbersome that
> > feels.
> >
> > So my preference is [1].
>
> What about [4]?
>
> Would that be more complicated/cumbersome to use than [1]?
>
> Seems to me using a factory or builder to create an updated immutable
> copy is the way to go here.
>

You mean a "mutable" copy right? Because the records are currently
immutable.

Gary


>
> > Gary
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [CSV][POLL] How to provide mutable records

2017-08-25 Thread Gary Gregory
I really do not like [4], I personally would never want to use such an odd
looking API with arrays that I have to build as input.

What this super simple "solution", #5: Add a new toArray() method to
CSVRecord:

/**
 * Clones a new array.
 *
 * @return a new array
 */
public String[] toArray() {
return values.clone();
}

You can edit the array as you wish and feed it to a CSVPrinter.

We can still kibitz with the proposed solution but toArray() gives a
lighter weight solution.

Gary

On Tue, Aug 22, 2017 at 1:27 PM, Oliver Heger 
wrote:

>
>
> Am 21.08.2017 um 23:29 schrieb sebb:
> > On 21 August 2017 at 21:04, Gary Gregory  wrote:
> >> Hi All,
> >>
> >> We have a request for [CSV] to provide mutable records. There is no
> clear
> >> consensus to me on how to do this. The current CSVRecord class is
> immutable
> >> but is not documented as such. I attribute that to YAGNI up to now.
> >>
> >> Options range from simply making CSVRecord immutable to creating a new
> >> CSVMutableRecord class and a few things in between.
> >>
> >> I'd like to get a feel what the community thinks here. IMO this boils
> down
> >> to whether or not it matters that CSVRecord remains immutable.
> >>
> >> [0] do nothing
> >>
> >> [1] Add two put methods to CVSRecord making the class mutable:
> >> put(int,Object) and put(String,Object). This does not break BC but
> changes
> >> the runtime behavior for apps that expect immutable record and shard the
> >> records with other components.
> >>
> >> [2] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
> >> that a new boolean in CVSRecord allow method from 1) above to either
> work
> >> or throw an exception.
> >>
> >> [3] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
> >> that subclass of CVSRecord called CVSMutableRecord is created which
> >> contains two new put methods. See branch CSV-216.
> >>
> >> [4] The factory method:
> >>  /**
> >>   * @param orig Original to be copied.
> >>   * @param replace Fields to be replaced.
> >>   * @return a copy of "orig", except for the fields in "replace".
> >>   */
> >>  public static CSVRecord createRecord(CSVRecord orig,
> >>   Pair ... replace)
> >>
> >> Could also be:
> >>  public static CSVRecord createRecord(CSVRecord orig,
> >>   int[] replaceIndices,
> >>   String[] replaceValues)
> >>
> >> I like the simplicity of [1] and I coded [3] to see how cumbersome that
> >> feels.
> >>
> >> So my preference is [1].
> >
> > What about [4]?
> >
> > Would that be more complicated/cumbersome to use than [1]?
> >
> > Seems to me using a factory or builder to create an updated immutable
> > copy is the way to go here.
>
> Since Java 8 functional concepts and immutable data structures become
> more and more popular. It feels a bit strange to me going the opposite
> route. So my preference would also go towards [4].
>
> The main use case was ETL, correct? We could check how such an approach
> would look like in such a scenario and maybe even add more support, e.g.
> implement a transformation loop that allows configuring a transformation
> function.
>
> Oliver
>
> >
> >> Gary
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [CSV][POLL] How to provide mutable records

2017-08-25 Thread Gary Gregory
On Mon, Aug 21, 2017 at 3:29 PM, sebb  wrote:

> On 21 August 2017 at 21:04, Gary Gregory  wrote:
> > Hi All,
> >
> > We have a request for [CSV] to provide mutable records. There is no clear
> > consensus to me on how to do this. The current CSVRecord class is
> immutable
> > but is not documented as such. I attribute that to YAGNI up to now.
> >
> > Options range from simply making CSVRecord immutable to creating a new
> > CSVMutableRecord class and a few things in between.
> >
> > I'd like to get a feel what the community thinks here. IMO this boils
> down
> > to whether or not it matters that CSVRecord remains immutable.
> >
> > [0] do nothing
> >
> > [1] Add two put methods to CVSRecord making the class mutable:
> > put(int,Object) and put(String,Object). This does not break BC but
> changes
> > the runtime behavior for apps that expect immutable record and shard the
> > records with other components.
> >
> > [2] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
> > that a new boolean in CVSRecord allow method from 1) above to either work
> > or throw an exception.
> >
> > [3] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
> > that subclass of CVSRecord called CVSMutableRecord is created which
> > contains two new put methods. See branch CSV-216.
> >
> > [4] The factory method:
> >  /**
> >   * @param orig Original to be copied.
> >   * @param replace Fields to be replaced.
> >   * @return a copy of "orig", except for the fields in "replace".
> >   */
> >  public static CSVRecord createRecord(CSVRecord orig,
> >   Pair ... replace)
> >
> > Could also be:
> >  public static CSVRecord createRecord(CSVRecord orig,
> >   int[] replaceIndices,
> >   String[] replaceValues)
> >
> > I like the simplicity of [1] and I coded [3] to see how cumbersome that
> > feels.
> >
> > So my preference is [1].
>
> What about [4]?
>
> Would that be more complicated/cumbersome to use than [1]?
>
> Seems to me using a factory or builder to create an updated immutable
> copy is the way to go here.
>

You mean a "mutable" copy right? Because the records are currently
immutable.

Gary


>
> > Gary
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [CSV][POLL] How to provide mutable records

2017-08-22 Thread Oliver Heger


Am 22.08.2017 um 21:34 schrieb Gary Gregory:
> On Tue, Aug 22, 2017 at 1:27 PM, Oliver Heger 
> wrote:
> 
>>
>>
>> Am 21.08.2017 um 23:29 schrieb sebb:
>>> On 21 August 2017 at 21:04, Gary Gregory  wrote:
 Hi All,

 We have a request for [CSV] to provide mutable records. There is no
>> clear
 consensus to me on how to do this. The current CSVRecord class is
>> immutable
 but is not documented as such. I attribute that to YAGNI up to now.

 Options range from simply making CSVRecord immutable to creating a new
 CSVMutableRecord class and a few things in between.

 I'd like to get a feel what the community thinks here. IMO this boils
>> down
 to whether or not it matters that CSVRecord remains immutable.

 [0] do nothing

 [1] Add two put methods to CVSRecord making the class mutable:
 put(int,Object) and put(String,Object). This does not break BC but
>> changes
 the runtime behavior for apps that expect immutable record and shard the
 records with other components.

 [2] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
 that a new boolean in CVSRecord allow method from 1) above to either
>> work
 or throw an exception.

 [3] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
 that subclass of CVSRecord called CVSMutableRecord is created which
 contains two new put methods. See branch CSV-216.

 [4] The factory method:
  /**
   * @param orig Original to be copied.
   * @param replace Fields to be replaced.
   * @return a copy of "orig", except for the fields in "replace".
   */
  public static CSVRecord createRecord(CSVRecord orig,
   Pair ... replace)

 Could also be:
  public static CSVRecord createRecord(CSVRecord orig,
   int[] replaceIndices,
   String[] replaceValues)

 I like the simplicity of [1] and I coded [3] to see how cumbersome that
 feels.

 So my preference is [1].
>>>
>>> What about [4]?
>>>
>>> Would that be more complicated/cumbersome to use than [1]?
>>>
>>> Seems to me using a factory or builder to create an updated immutable
>>> copy is the way to go here.
>>
>> Since Java 8 functional concepts and immutable data structures become
>> more and more popular. It feels a bit strange to me going the opposite
>> route. So my preference would also go towards [4].
>>
>> The main use case was ETL, correct? We could check how such an approach
>> would look like in such a scenario and maybe even add more support, e.g.
>> implement a transformation loop that allows configuring a transformation
>> function.
>>
> 
> 
> The use case is, IMO, _lightweight_ ETL; for anything serious I would use
> Spring Batch. This is why I favor the simplest solution.

For this poll my first point is relevant. Regarding lightweight ETL,
maybe the new reactive streams in Java 9 may bring some interesting
concepts for the future?

Oliver
> 
> Gary
> 
> 
>>
>> Oliver
>>
>>>
 Gary
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>>
> 

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [CSV][POLL] How to provide mutable records

2017-08-22 Thread Gary Gregory
On Tue, Aug 22, 2017 at 1:27 PM, Oliver Heger 
wrote:

>
>
> Am 21.08.2017 um 23:29 schrieb sebb:
> > On 21 August 2017 at 21:04, Gary Gregory  wrote:
> >> Hi All,
> >>
> >> We have a request for [CSV] to provide mutable records. There is no
> clear
> >> consensus to me on how to do this. The current CSVRecord class is
> immutable
> >> but is not documented as such. I attribute that to YAGNI up to now.
> >>
> >> Options range from simply making CSVRecord immutable to creating a new
> >> CSVMutableRecord class and a few things in between.
> >>
> >> I'd like to get a feel what the community thinks here. IMO this boils
> down
> >> to whether or not it matters that CSVRecord remains immutable.
> >>
> >> [0] do nothing
> >>
> >> [1] Add two put methods to CVSRecord making the class mutable:
> >> put(int,Object) and put(String,Object). This does not break BC but
> changes
> >> the runtime behavior for apps that expect immutable record and shard the
> >> records with other components.
> >>
> >> [2] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
> >> that a new boolean in CVSRecord allow method from 1) above to either
> work
> >> or throw an exception.
> >>
> >> [3] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
> >> that subclass of CVSRecord called CVSMutableRecord is created which
> >> contains two new put methods. See branch CSV-216.
> >>
> >> [4] The factory method:
> >>  /**
> >>   * @param orig Original to be copied.
> >>   * @param replace Fields to be replaced.
> >>   * @return a copy of "orig", except for the fields in "replace".
> >>   */
> >>  public static CSVRecord createRecord(CSVRecord orig,
> >>   Pair ... replace)
> >>
> >> Could also be:
> >>  public static CSVRecord createRecord(CSVRecord orig,
> >>   int[] replaceIndices,
> >>   String[] replaceValues)
> >>
> >> I like the simplicity of [1] and I coded [3] to see how cumbersome that
> >> feels.
> >>
> >> So my preference is [1].
> >
> > What about [4]?
> >
> > Would that be more complicated/cumbersome to use than [1]?
> >
> > Seems to me using a factory or builder to create an updated immutable
> > copy is the way to go here.
>
> Since Java 8 functional concepts and immutable data structures become
> more and more popular. It feels a bit strange to me going the opposite
> route. So my preference would also go towards [4].
>
> The main use case was ETL, correct? We could check how such an approach
> would look like in such a scenario and maybe even add more support, e.g.
> implement a transformation loop that allows configuring a transformation
> function.
>


The use case is, IMO, _lightweight_ ETL; for anything serious I would use
Spring Batch. This is why I favor the simplest solution.

Gary


>
> Oliver
>
> >
> >> Gary
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [CSV][POLL] How to provide mutable records

2017-08-22 Thread Oliver Heger


Am 21.08.2017 um 23:29 schrieb sebb:
> On 21 August 2017 at 21:04, Gary Gregory  wrote:
>> Hi All,
>>
>> We have a request for [CSV] to provide mutable records. There is no clear
>> consensus to me on how to do this. The current CSVRecord class is immutable
>> but is not documented as such. I attribute that to YAGNI up to now.
>>
>> Options range from simply making CSVRecord immutable to creating a new
>> CSVMutableRecord class and a few things in between.
>>
>> I'd like to get a feel what the community thinks here. IMO this boils down
>> to whether or not it matters that CSVRecord remains immutable.
>>
>> [0] do nothing
>>
>> [1] Add two put methods to CVSRecord making the class mutable:
>> put(int,Object) and put(String,Object). This does not break BC but changes
>> the runtime behavior for apps that expect immutable record and shard the
>> records with other components.
>>
>> [2] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
>> that a new boolean in CVSRecord allow method from 1) above to either work
>> or throw an exception.
>>
>> [3] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
>> that subclass of CVSRecord called CVSMutableRecord is created which
>> contains two new put methods. See branch CSV-216.
>>
>> [4] The factory method:
>>  /**
>>   * @param orig Original to be copied.
>>   * @param replace Fields to be replaced.
>>   * @return a copy of "orig", except for the fields in "replace".
>>   */
>>  public static CSVRecord createRecord(CSVRecord orig,
>>   Pair ... replace)
>>
>> Could also be:
>>  public static CSVRecord createRecord(CSVRecord orig,
>>   int[] replaceIndices,
>>   String[] replaceValues)
>>
>> I like the simplicity of [1] and I coded [3] to see how cumbersome that
>> feels.
>>
>> So my preference is [1].
> 
> What about [4]?
> 
> Would that be more complicated/cumbersome to use than [1]?
> 
> Seems to me using a factory or builder to create an updated immutable
> copy is the way to go here.

Since Java 8 functional concepts and immutable data structures become
more and more popular. It feels a bit strange to me going the opposite
route. So my preference would also go towards [4].

The main use case was ETL, correct? We could check how such an approach
would look like in such a scenario and maybe even add more support, e.g.
implement a transformation loop that allows configuring a transformation
function.

Oliver

> 
>> Gary
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [CSV][POLL] How to provide mutable records

2017-08-21 Thread sebb
On 21 August 2017 at 21:04, Gary Gregory  wrote:
> Hi All,
>
> We have a request for [CSV] to provide mutable records. There is no clear
> consensus to me on how to do this. The current CSVRecord class is immutable
> but is not documented as such. I attribute that to YAGNI up to now.
>
> Options range from simply making CSVRecord immutable to creating a new
> CSVMutableRecord class and a few things in between.
>
> I'd like to get a feel what the community thinks here. IMO this boils down
> to whether or not it matters that CSVRecord remains immutable.
>
> [0] do nothing
>
> [1] Add two put methods to CVSRecord making the class mutable:
> put(int,Object) and put(String,Object). This does not break BC but changes
> the runtime behavior for apps that expect immutable record and shard the
> records with other components.
>
> [2] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
> that a new boolean in CVSRecord allow method from 1) above to either work
> or throw an exception.
>
> [3] Add a "mutableRecord" boolean option to CVSRecord and CSVFormat such
> that subclass of CVSRecord called CVSMutableRecord is created which
> contains two new put methods. See branch CSV-216.
>
> [4] The factory method:
>  /**
>   * @param orig Original to be copied.
>   * @param replace Fields to be replaced.
>   * @return a copy of "orig", except for the fields in "replace".
>   */
>  public static CSVRecord createRecord(CSVRecord orig,
>   Pair ... replace)
>
> Could also be:
>  public static CSVRecord createRecord(CSVRecord orig,
>   int[] replaceIndices,
>   String[] replaceValues)
>
> I like the simplicity of [1] and I coded [3] to see how cumbersome that
> feels.
>
> So my preference is [1].

What about [4]?

Would that be more complicated/cumbersome to use than [1]?

Seems to me using a factory or builder to create an updated immutable
copy is the way to go here.

> Gary

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org