[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-05-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484492#comment-16484492
 ] 

Patrick Gäckle commented on CSV-222:


I'm having a look into. Also I will close the PR as it really seems to be 
easier to use the reader by only looking at the name.
I will get back to you when done.

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv, faulty2.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-05-22 Thread Gary Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484215#comment-16484215
 ] 

Gary Gregory commented on CSV-222:
--

Please try to use the new classes in Commons IO 2.7-SNAPSHOT: 
{{CharacterSetFilterReader}} and {{CharacterFilterReader}}.

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv, faulty2.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-05-21 Thread Gary Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483196#comment-16483196
 ] 

Gary Gregory commented on CSV-222:
--

See WIP in IO-577.

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv, faulty2.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-05-21 Thread Gary Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482989#comment-16482989
 ] 

Gary Gregory commented on CSV-222:
--

Thank you for the PR. 

I am wondering if, instead of further complicating the lexer code, it wouldn't 
be cleaner and simpler to do the filtering in a reader. For example, I might 
propose something like the following for Commons IO:
{code:java}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.commons.io.input;

import java.io.FilterReader;
import java.io.IOException;
import java.io.Reader;
import java.util.HashSet;
import java.util.Set;

/**
 * A filter reader that removes a given set of characters represented as int 
code points.
 */
public class IntegerSetFilterReader extends FilterReader {

private static final HashSet EMPTY_SET = new HashSet<>(0);
private final Set intSet;

/**
 * Constructs a new reader.
 * 
 * @param in
 *the reader to filter
 * @param intSet
 *what to filter
 */
public IntegerSetFilterReader(Reader in, Set intSet) {
super(in);
this.intSet = intSet == null ? EMPTY_SET : intSet;
}

@Override
public int read() throws IOException {
int ch;
do {
ch = super.read();
} while (skip(ch));
return ch;
}

private boolean skip(int ch) {
// Note that you can increase the Integer cache with a system property.
return intSet.contains(Integer.valueOf(ch));
}

@Override
public int read(char[] cbuf, int off, int len) throws IOException {
int read = super.read(cbuf, off, len);
if (read == -1) {
return -1;
}
int pos = off - 1;
for (int readPos = off; readPos < off + read; readPos++) {
if (skip(read)) {
continue;
}
pos++;
if (pos < readPos) {
cbuf[pos] = cbuf[readPos];
}
}
return pos - off + 1;
}
}
{code}

Thoughts?

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv, faulty2.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-05-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482724#comment-16482724
 ] 

Patrick Gäckle commented on CSV-222:


Opened PR: https://github.com/apache/commons-csv/pull/29

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv, faulty2.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-05-21 Thread Gary Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482692#comment-16482692
 ] 

Gary Gregory commented on CSV-222:
--

It's easier for anyone to review your changes if you create a PR...

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv, faulty2.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-04-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438297#comment-16438297
 ] 

Patrick Gäckle commented on CSV-222:


Oh BTW I have an issue with dependencies for the CSVBenchmark class 
(\src\test\java\org\apache\commons\csv\CSVBenchmark.java).
Can you help me on this?

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv, faulty2.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-04-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438138#comment-16438138
 ] 

Patrick Gäckle commented on CSV-222:


[~garydgregory] I did some coding on this but as I'm not familar with this 
project I'm quite not sure if I missed something.
Any chance you'd have a look before I create the PR (this is the first time I'm 
contributing)?
--> 
https://github.com/LostKatana/commons-csv/commits/feature/CSV-222_ignore_set_of_characters

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv, faulty2.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-04-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425803#comment-16425803
 ] 

Patrick Gäckle commented on CSV-222:


Ah sure. I see what I can do about that.

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv, faulty2.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-04-04 Thread Gary Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425799#comment-16425799
 ] 

Gary Gregory commented on CSV-222:
--

That would be a "Pull Request" on GitHub: https://github.com/apache/commons-csv

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv, faulty2.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-04-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425794#comment-16425794
 ] 

Patrick Gäckle commented on CSV-222:


Sorry I don't know what PR means.

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv, faulty2.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-04-04 Thread Gary Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425725#comment-16425725
 ] 

Gary Gregory commented on CSV-222:
--

In faulty2.csv, you have SOH+STX between headers and in record separators.
As of now, you need to filters these characters before they get to Commons CSV.
We would need a new features that completely ignores a given set of characters 
between tokens.
Do you want to provide a PR for that?

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv, faulty2.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-04-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425645#comment-16425645
 ] 

Patrick Gäckle commented on CSV-222:


You slightly missunderstood me or I was not precise enough.
I attached [^faulty2.csv] where you can see in header row there is also an SOH 
and STX in column1 before the columns separator.
This is currently no problem but it is for the last column in a row.

Hope I could decribe this a bit better now.

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv, faulty2.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-04-04 Thread Gary Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425629#comment-16425629
 ] 

Gary Gregory commented on CSV-222:
--

The issue you initially described talked about special characters in the record 
separator, not the column delimiter.
The column delimiter is currently limited to a single character. There is a 
separate ticket to enhance the column delimiter to a String instead of a char.

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-04-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425139#comment-16425139
 ] 

Patrick Gäckle commented on CSV-222:


Thanks [~garydgregory]. Haven't thought of this solution.

Anyways I still thinnk it is a bug as when placing these characters in betweens 
column 1 and column 2 nothing happens. Only when it is the last character read 
as possible "line end".

For myself the solution of this is using a FilterReader that throws away all 
non printable characters as it happend to have a lot more in this file I need 
to process.
Thanks

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-04-03 Thread Gary Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424775#comment-16424775
 ] 

Gary Gregory commented on CSV-222:
--

Call {{org.apache.commons.csv.CSVFormat.withRecordSeparator(String)}} and use 
Unicode literals to specify whatever characters you want like 
{{"\u0001\u0002\u0003"}}.

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-03-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419448#comment-16419448
 ] 

Patrick Gäckle commented on CSV-222:


This is the current workaround  I use.
Maybe it would be nice to include the position in the log statement as another 
hint where to search.

I'd really would like to see some option to just leave characters not 
identified as in colum aside.

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-03-29 Thread Gary Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419025#comment-16419025
 ] 

Gary Gregory commented on CSV-222:
--

Does {{org.apache.commons.csv.CSVFormat.withRecordSeparator(String)}} work for 
you then?

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-03-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416920#comment-16416920
 ] 

Patrick Gäckle commented on CSV-222:


Setting the end-of-record marker to SOH-STX-LF would help me as this would 
match my current problem.
Recovering from junk would be the long lasting solution. I can think of an 
_lazy reading option_ that instead of throwing an error
when something unexpected happens between encapsulated token and delimiter just 
continues without taking any action like appending text to current field/header 
or continueing to the next field.

Thanks.

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CSV-222) invalid char between encapsulated token and delimiter

2018-03-27 Thread Gary Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416349#comment-16416349
 ] 

Gary Gregory commented on CSV-222:
--

Are expecting that Commons CSV should somehow recover from junk in the input? 
Or do want to be able to set the end-of-record marker to SOH-STX-LF?

> invalid char between encapsulated token and delimiter
> -
>
> Key: CSV-222
> URL: https://issues.apache.org/jira/browse/CSV-222
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.4
>Reporter: Patrick Gäckle
>Priority: Major
> Attachments: faulty.csv
>
>
> When trying to read the file [^faulty.csv] and parse it I get the following 
> error:
> {code}
> java.io.IOException: (line 1) invalid char between encapsulated token and 
> delimiter
>   at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
>   at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>   at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:500)
>   at org.apache.commons.csv.CSVParser.initializeHeader(CSVParser.java:389)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:284)
>   at org.apache.commons.csv.CSVParser.(CSVParser.java:252)
>   at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:846)
> {code}
> The line of code is the parsing part returning the iterator of it:
> {code:java}
> csvFormat = 
> CSVFormat.DEFAULT.withHeader().withDelimiter(';').withIgnoreHeaderCase();
> iterator = csvFormat.parse(reader).iterator();
> {code}
> The invalid char is the contained SOH and STX non printable characters at the 
> end of line.
> I debugged through the source of this and ran into the Exception in the Lexer 
> not handling these special characters
> Unfortunately I'm not able to provide some hints on fixing this as I'm not 
> familiar with these type of characters and what behaviour they should have.
> Sincerely



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)